Building an Scalable System
This blog post is just a drunk-discussion I had. There are three pillars we identified —
1. Running more than 1 replica should not be a trouble -
Such that running n copy of same application behind a Load-Balancer should not be a problem.
What it mean is —
- Application should be stateless — Consecutive request to different replicas should be served well.
- No Session on Disk or File System (Alternative — use Redis)
- No Logs on Disk (Alternative — open source Graylog; paid Logentries, Scalyr)
- No Cron as part of the Application process (avoid using node-cron or @Scheduled annotation) (Alternative — Use a different environment for triggering the Scheduled task; which should be independent from the application)
2. An API or Process life-cycle should be minimal
Creating new replica in case load increases, and removing the existing one in case of cool-down should not be a concern. Further the Termination event for a process should be handled gracefully (It should stop receiving new Requests after a SIGTERM event, but existing should be processed gracefully). If a Process or API request is taking more than 200 ms (This threshold is a personal opinion and could vary basis on the use-case; just define a threshold you’re comfortable with and you believe you can hold on to).
If you are following this, you’re saving the trouble to maintain consistency across the processes. Since process is not a big process rather that just a chunk, any problem/issue can be identified in a short manner.
3. Job distribution in System Design
Design system in such a way, such that an spike didn’t turn everything into gutter (Though this is a very vague statement) — Of course this is what everyone wants to achieve, but question is how to go about it.
Let me try to put it up with an example. If I’m ensuring that in a normal traffic, my application is taking 150 ms to response, than in case the traffic increases up to 5x, your response may go up to 2 sec. Since the traffic is really unexpected, we can live with a 2 sec response time. As part of the application process, it is doing multiple other stuff. Lets say if an order is placed, a transaction record is created, balances are adjusted, multiple communications (Email, SMS, Push Notification) are triggered, an analytics layer comes into picture capturing the updated data etc. The API processing (for which End user is waiting for the Response) should be very minimal. In last example, only a Transaction should be saved in the Database (and updating balance — depends on Priority), rest all operation can be executed with the reference of this recorded row at later stages (e.g Sending communication, pushing data for analytics storage). Start maintaining queue for each processing which can be delayed.
Now it may be the scenario that you’ll have to send notification at multiple places, for Transactional data, you may want to send an SMS immediately or for any vendor you want to inform the Vendor as early as possible. You can increase consumers for the specific processes. At the end, since for each process you’re maintaining a Queue, increase in load will just increase the Unprocessed Messages in a Queue, and will let you use the system without spiking up the CPU or Memory. Increase the consumer for a specific process will ensure that it is up to the pace with the traffic.
Point #2 and #3 complements each other. For Job Distribution, defined each Job such that it’s processing time is minimal, and if a Job is taking noticeable time to process, it should be broken further.
Process 1, 2 & 3 can have different number of consumers depends on processing time of each process. If Process 3 happens to be communicating with a third-party API for some processing which is slow in it self, than Process 3 should have more consumers.
PS — Any feedback is welcomed.