Just a second...

High availability

Consider how to replicate session and topic information between Diffusion™ servers to increase availability and reliability.

Diffusion uses a datagrid to share session and topic information between Diffusion servers and provide high availability for clients connecting to load-balanced servers.

Figure 1. Information sharing using a datagrid Diffusion servers share information by reflecting it into a datagrid.

Diffusion uses Hazelcast™ as its datagrid. Hazelcast is a third-party product that is included in the Diffusion server installation and runs within the Diffusion server process.

The datagrid is responsible for the formation of clusters and the exchange of replicated data. These clusters operate on a peer-to-peer basis and by default there is no hierarchy of servers within the cluster.

Servers reflect session and topic information into the datagrid. If a server becomes unavailable, another server can access the session and topic information that is stored in the datagrid and take over the responsibilities of the first server.

See Configuring the Diffusion server to use replication and for more details.

Considerations

Consider the following factors when using replication with Hazelcast:
  • By default Hazelcast uses multicast to discover other nodes to replicate data to. This is not secure for production use. In production, configure your Hazelcast nodes to replicate data only with explicitly defined nodes. For more information, see Configuring the Hazelcast datagrid.
  • When Diffusion servers are merged into a cluster, the servers can have inconsistent replicated data. Unresolved inconsistencies can cause unpredictable behavior, due to issues such as conflicts between updaters. If the inconsistencies cannot be resolved, the inconsistent Diffusion server or servers are shutdown and must be restarted.

    Diffusion servers in a cluster can become inconsistent in a number of circumstances; for example, if a network partitions and then heals.

    The quorum setting can help prevent inconsistencies due to network partitions. It enables you to set a minimum size for a cluster, below which the servers in a cluster will all shut down.

    You should choose a quorum value so that after a network partition, the smaller cluster will shut down instead of attempting to heal. The servers from the smaller cluster can then be restarted and join the cluster cleanly, avoiding inconsistencies.

    If you want to use the quorum feature, use an odd number of servers and set the value to just over half the cluster size. For example, if you have 9 servers in a cluster, set the quorum value to 5.

    Note that servers shut down by the quorum feature will not restart automatically.