The story behind WSO2 SP Minimum High Availability

Umesha Guruge
3 min readAug 19, 2019

WSO2 SP is a lightweight stream processing platform that allows collecting, processing and analyzing events in other words data in real-time. Suppose now you have configured a WSO2 Stream processor to cater your data processing and analytics needs which is working fine. And suddenly your server goes down unexpectedly. Now, what??? The data coming to your server is going to be lost forever.

This is when WSO2 SP high availability comes to play to save your lives. In this mechanism, you have at minimum two nodes running parallel to deal with unexpected SP instance failures.

Now let’s see how this Minimum high availability work. In this set up we have two SP instances running parallelly where one instance is called Active node, and you guessed it right, the other one is called Passive node.

Like the words depict the Active node is actively participating in event receiving, processing, and publishing. The Passive node is just receiving events and doesn’t engage in processing or publishing, but patiently waiting to take over and save the day when the active node goes down. Now we have two questions

  1. How does the Passive node know that the Active node is gone down?
  2. How does the Passive node receive the events or data that has been coming active node?

So to answer this we have dive a little deeper. Now, these two clustered instances are sharing a database. Let’s call it WSO2_CLUSTER_DB. They are periodically sending a heartbeat to this DB and tell that they are alive. So the passive node is checking this DB to see whether the active node is still alive and well.

Once the Active node’s heartbeat doesn’t come within the given time the Passive will decide okay now Active has gone down I have to take over and it will elect itself as the coordinator of the cluster and become the Active node.

Regarding how the passive nodes receive the events, the source that sends data doesn’t know anything about what is happening, how many nodes there are, which node is working and which isn’t. Only the active node will open the source ports so that clients will always send events to the active node. The active node will internally send events to the passive node. This will prevent any loss of events and also wouldn’t affect the passive node’s transformation to the active. This is done using an event buffer queue which is consumed by the passive node. Just like the active node, the passive node will receive the incoming events and if and only if active goes down and passive becomes active, it will start and processing and publishing the received data.

Another important part is communication between the two nodes. The two nodes directly communicate only when sharing events. All other communications happen through the shared DB. Both nodes are reading from and writing to the WSO2_CLUSTER_DB.

So when the transformation is completed, the earlier passive, now active node will start to resume the processing. And when the earlier active node who went down leaving us hanging will come back and take his place as the passive node and wait for the currently active node to go down and resume the above process all over again, making sure the system performance is not affected even if one SP instance goes down.

--

--