A web technology startup recently contacted us to handle performance issues that they were encountering on their web application. Their product was a document collaboration SaaS for enterprises. Our client started their business and its promotion 6 months prior to contacting us. Initially traffic was low but was slowly building. As per every entrepreneur’s dream the traffic now increased like anything and keeping the service up to the mark for their clients was getting more and more difficult. In a nutshell the problem was to scale the app, make it highly available and fault tolerant. An intrinsic but more difficult problem was to keep servers in rapid sync. They already tried most popular synchronization solutions but encountered performance issue. Commercial solutions were out of their budget. So we have been tasked to come up with high availability, load balancing and synchronization solutions that are cost effective and do not eat hardware resources like elephants.
We requested our client for their product walk through, our team was particularly interested in product architecture overall and how each components is playing its part. We were hunting for breakpoints, on which, service can be made distributed. Luckily the app was build with SOA approach which enable us to provide better options. First we quickly proposed to separate entire thing in two major parts namely “website” and “webapp” and host them separately. Further load profiling was conducted on the webapp, resulting in further separation of roles from application-usage point of view. It was identified that “file editing function” consumes more cpu cycles, memory and bandwidth. Next file repository was expanded and made redundant which introduces the long haunting problem of fast synchronization of files between file servers. We evaluated various open source solutions including drdb, rsync and unison but discarded based on variety of test performed. Then we came up with an open source cluster synchronization tool that performs lightning fast synchronization throughout the cluster and provides a lot of configuration options with unlimited number of cluster nodes. This tool stored information about state of files it kept in sync with other nodes for fast files comparison and has an efficient algorithm to detect which version of the file changed last. Next we need to have come up with a way to trigger the sync as soon as any single file is created, modified or deleted. For this a small but robust C based application was written that receives file events from OS kernel and trigger the sync between cluster nodes. The entire cluster was load balanced via DNS round robin technique. This helps files being served from different nodes each time for each user, but remain synced in case another user accesses same file concurrently, which was the heavily utilized use case for our client’s customers. After deployment we conducted through testing of this system. One of those test was to generating 1000s of small files via bash script on one node under few seconds and note how quickly they are available on other nodes.
The results was simply amazing. All files that our script generated were immediately available on all other nodes i.e under 1 second while consuming 0.08% cpu on each node during synchronization. Our client is now selling their service with high confidence and able to retain previous customers i.e. less number of complains tickets. They said their efficient, fast, load balanced and robust file collaboration service is better positioned in market today and they have an edge over competitor.