Consider how to replicate session, topic and configuration information between a cluster of Diffusion™ servers to increase availability and
Diffusion uses a datagrid to share session and topic information
between Diffusion servers within a cluster, providing high availability
for clients connecting to load-balanced servers.
Diffusion uses Hazelcast™ as its
datagrid. Hazelcast is a third-party product that is
included in the Diffusion server installation and runs within the Diffusion server process.
The datagrid is responsible for the formation of clusters and the exchange of
replicated data. These clusters operate on a peer-to-peer basis and by default there
is no hierarchy of servers within the cluster.
Servers reflect session and topic information into the datagrid. If a server becomes
unavailable, another server can access the session and topic information that is
stored in the datagrid and take over the responsibilities of the first server.
As well as session and topic information, servers can use configuration replication
to replicate configuration items such as security stores, topic views and metric collectors.
Configuration replication is active if session or topic replication is enabled, or it can be enabled separately.
Many Diffusion features are cluster-aware, meaning that
requests or messages can be routed within a cluster to the correct server. These features are cluster-aware:
control authentication handler requests
missing topic notifications
Some client control operations are cluster-aware. The command will be routed to the server
in the cluster that hosts the specified session. When sending a request to a session filter,
the command is applied to all matching sessions across the cluster.
These client control operations are cluster-aware:
Consider the following factors when using replication with Hazelcast:
By default Hazelcast uses multicast to
discover other nodes to replicate data to. This is not secure for production
use. In production, configure your Hazelcast
nodes to replicate data only with explicitly defined nodes. For more
information, see Configuring the Hazelcast datagrid.
When Diffusion servers are
merged into a cluster, the servers can have inconsistent replicated data.
Unresolved inconsistencies can cause unpredictable behavior, due to issues such
as conflicts between updaters. If the
inconsistencies cannot be resolved, this is known as "split-brain". The inconsistent Diffusion server or servers are shutdown and must
Diffusion servers in a cluster can
become inconsistent in a number of circumstances; for example, if a
network partitions and then heals.
The quorum setting can help prevent inconsistencies due to network partitions.
It enables you to set
a minimum size for a cluster, below which the servers in a cluster will
all shut down.
You should choose a quorum value so that after a network partition, the smaller
cluster will shut down instead of attempting to heal. The servers
from the smaller cluster can then be restarted and join the cluster cleanly,
If you want to use the quorum feature, use an odd number of servers
and set the value to just over half the cluster size. For example,
if you have 5 servers in a cluster, set the quorum value to 3.
Note that servers shut down by the quorum feature will not restart
An ideally sized cluster contains at least 3 nodes, and no more than 5
without consultation. Design your cluster to contain an odd number
of servers, as these cannot fail to recover from a "split-brain".