Designing a Scalable Gamification Engine— Part 1
This is the first in a series of entries detailing my journey to create a robust, scalable, and performant platform that enables gamification for other applications.
A project that I work on has a few proposed features that we have been tossing around for a while. Of those requested features, the most relatable is a badging system. The idea is that some users should be able to define custom badges that other users can earn in response to engaging with our application in certain ways. It is a great way to promote specific usage patterns in your system.
The two other requested features are:
- A mechanism to allow some users to access to new pages in the website after reading specific articles (configurable by content owner).
- A mechanism to let users assign “to do” lists to other users. Once complete, the assigning users should be informed.
Without diving any deeper into my project’s specific needs, these three proposed features all involve some form of custom goal definition and goal progress tracking. The similarities in between the features encourage me to write a common base solution rather than recreate the wheel for each feature.
My intent is to use the 2–3 features as training data to flesh out a full-blown gamification engine that could enable similar features in other applications. I’ll consider it extra credit to open source the project if it proves valuable to others.
To write a system that enables the features above, we start with our high-level requirements that should enable each feature. We’ll say that we need a system that will:
- Allow client applications to define goals for certain entities that are completed when those entities fulfill a goal’s criteria.
- Allow clients to feed usage information into our system so that it tracks entity progress towards completing those goal criteria.
- Allow clients to query for an entity’s progress towards its goals.
And our goals have two salient boundaries:
- We are only creating the backend infrastructure for the platform. The client applications need to provide their own user interfaces to create goals or view progress towards achieving a goal.
- The system must perform well under heavy load. Client applications register millions of events per day which could contribute towards a goal.
Here is our view of the world at this point given our basic system requirements and boundaries.
Some thoughts come to mind immediately:
- The ‘define goal’ and ‘see goal progress’ components of our system will see far less traffic than the ‘notify of usage relevant to goal’ component.
- The ‘define goal’ and ‘see goal progress’ components are effectively just CRUD operations. In contrast, the ‘notify of usage relevant to goal’ has no need for Read/Update/Delete operations.
We separate these components into different systems accordingly.
Our two components (CRUD and event-processing) will interact with each other, and may have common data stores, but they can now live in separate stacks and scale independently.
The CRUD components could be fulfilled with a simple REST api and persistent data store. The usage will be low (measured in requests per minute, not requests per second) so we do not need to overoptimize at this point. For simplicity we’ll assume that a standard NodeJS application will do the trick.
The relationship that the data store will have with our ‘event-processing’ component is not clear enough to me to have a strong recommendation on technology yet. For simplicity we’ll assume that MongoDB is satisfactory until we flesh out that relationship a little more.
Note that I used the word ’event’ in naming my second component. It is my opinion that we will want to adopt an event oriented architecture for this part of the solution. My reasoning is:
- Hundreds of relevant user interactions occur every second. Requiring client applications to issue REST request for each one would create unnecessary overhead and direct server-to-server coupling.
- Our system needs to process all usage even if a component goes down for a short period of time (e.g. during reboot). If a app server reboot occurs in the middle of a REST request, that usage data would be lost forever unless we made the client responsible for retry logic. Not ideal.
- There is more reusability in having clients report events generically to a centralized message broker instead of having them perform direct REST calls.
Additionally, the business needs that we established can be expressed in terms of events. For example — a goal’s criteria can be expressed in terms of system events that must occur for the goal to be completed (e.g. if a goal requires users to log in 5 times, it would be logical to express the goal in terms of ‘login’ events). This makes an event driven design a logical choice.
Here is a revision of what our system looks like now.
We’ll stop here for now. As it stands, we have the basic infrastructure taking shape. Some concerns that we’ll want to think through in the next articles:
- What schema will allow us to conceptually associate goals to usage events?
- How do we process received events quickly and reliably while tracking progress towards goals?