Just a second...

Server clusters for high availability

Consider how to replicate session, topic and configuration information between a cluster of Diffusion™ servers to increase availability and reliability.

Diffusion uses a datagrid to share session and topic information between Diffusion servers within a cluster, providing high availability for clients connecting to load-balanced servers.

Figure 1. Information sharing using a datagrid Diffusion servers share information by reflecting it into a datagrid.

Diffusion uses Hazelcast™ as its datagrid. Hazelcast is a third-party product that is included in the Diffusion server installation and runs within the Diffusion server process.

The datagrid is responsible for the formation of clusters and the exchange of replicated data. These clusters operate on a peer-to-peer basis and by default there is no hierarchy of servers within the cluster.

Servers reflect session and topic information into the datagrid. If a server becomes unavailable, another server can access the session and topic information that is stored in the datagrid and take over the responsibilities of the first server.

As well as session and topic information, servers can use configuration replication to replicate configuration items such as security stores, topic views and metric collectors.

Configuration replication is active if session or topic replication is enabled, or it can be enabled separately.

Many Diffusion features are cluster-aware, meaning that requests or messages can be routed within a cluster to the correct server. These features are cluster-aware:
  • control authentication handler requests
  • missing topic notifications
  • request-response messaging

Some client control operations are cluster-aware. The command will be routed to the server in the cluster that hosts the specified session. When sending a request to a session filter, the command is applied to all matching sessions across the cluster.

These client control operations are cluster-aware:
  • changeRoles
  • close
  • setConflated
  • setSessionProperties
  • getSessionProperties

See Configuring the Diffusion server to use replication and Replication.xml for more details.

Considerations

Consider the following factors when using replication with Hazelcast :
  • By default Hazelcast uses multicast to discover other nodes to replicate data to. This is not secure for production use. In production, configure your Hazelcast nodes to replicate data only with explicitly defined nodes. For more information, see Configuring the Hazelcast datagrid.
  • When Diffusion servers are merged into a cluster, the servers can have inconsistent replicated data. Unresolved inconsistencies can cause unpredictable behavior, due to issues such as conflicts between updaters. If the inconsistencies cannot be resolved, this is known as "split-brain". The inconsistent Diffusion server or servers are shutdown and must be restarted.

    Diffusion servers in a cluster can become inconsistent in a number of circumstances; for example, if a network partitions and then heals.

    The quorum setting can help prevent inconsistencies due to network partitions. It enables you to set a minimum size for a cluster, below which the servers in a cluster will all shut down.

    You should choose a quorum value so that after a network partition, the smaller cluster will shut down instead of attempting to heal. The servers from the smaller cluster can then be restarted and join the cluster cleanly, avoiding inconsistencies.

    If you want to use the quorum feature, use an odd number of servers and set the value to just over half the cluster size. For example, if you have 5 servers in a cluster, set the quorum value to 3.

    Note that servers shut down by the quorum feature will not restart automatically.

  • An ideally sized cluster contains at least 3 nodes, and no more than 5 without consultation. Design your cluster to contain an odd number of servers, as these cannot fail to recover from a "split-brain".