Diffusion™ stores and distributes data through topics.
heart of the Diffusion model lies the concept of a topic. This
page covers the various aspects of topic
management that make Diffusion unique, including persistent
subscriptions using topic selectors; topic query capabilities;
automatic topic removal; how Diffusion achieves high network
efficiency using delta streaming, conflation, and compression; and how
to secure topic data.
In Diffusion, data is stored and distributed through topics. Each
topic has a topic type and a current data value which is maintained in
memory on the server. A topic's type determines the data values that
can be stored and published through the topic.
Granted sufficient security privileges, a client session can subscribe
to a topic to receive notifications when the topic value changes and
can also update the value. When a topic is updated, all its
subscribers are notified of the new value. Diffusion takes care of
efficiently broadcasting value changes, even if there are many
thousands of subscribers.
Topics are identified by topic paths. A topic path is a string of
parts separated by the / character, for example,
Together, the set of topic paths forms the
The topic tree allows topics to be addressed in groups
using special expressions called topic selectors. For example, the
topic selector ?weather/capitals/
can be used to subscribe to all
topics below the topic path weather/capitals.
See the full
syntax of topic selector expressions.
Topics are lightweight and cheap to create and destroy. There are
commercial Diffusion applications that use millions of topics hosted
in a single server and create tens of thousands of topics when a new
tranche of data items becomes available. The low cost per topic allows
for topic trees with a fine-grained mapping to logical data models,
with each topic representing a discrete data item that can be updated
Topic types, values, and updates
There are nine topic types that can be grouped into four categories:
primitive; composite; multi-valued; and reference.
The four primitive topic types — string,
int64, double, and
binary — are used for topics with simple, atomic values. String
topics store text, int64 and double topics store numbers, and binary
topics can store arbitrary data such as a PNG image.
There are two composite topic types: JSON and
recordV2. A JSON
topic has a JSON value, a format that is familiar to developers and
array of fields, constrained by an optional schema. RecordV2 topics
exist as an upgrade path for applications that were previously using
the removed record topic type – new applications should use JSON
Many applications can get by using only the primitive and JSON topic
types. Multi-valued and reference topic types are more specialized and
less commonly used.
There is a single multi-valued topic type. The
time series topic
stores a history of events. Events are created by a special type of
update. Each event has a timestamp, records the security principal
that created it, and has a value. The values for a time series topic
are all of the same type, which can be string, int64, double,
binary, JSON, or recordV2 – that is, the same data types used for
primitive and composite topics. Ranges of events can be queried by
data range or event offset.
There are two reference topic types: slave and routing. These
are quite different from other topic types. Rather than storing a
data value, they re-present the values of other source topics at
their topic path. A source topic can be any primitive, composite, or
multi-valued topic. A slave topic has a fixed source topic. A
routing topic calls out to an application-provided routing handler
to determine the source topic for each subscribing session.
When a topic is created, it has no value. A client session can update
the topic by providing a value. Primitive and composite topics store
the latest received value. Time series topics store a configurable
history of values. Each reference topic re-presents the value or
values of its source topic.
Sometimes there is no need to store the current value of a topic.
Perhaps the value has a limited lifetime and is only of transient
worth. A topic can be configured not to retain its last value to
reduce the server memory footprint. However, this disables the delta
streaming optimization (see below), so is not often done.
For a given topic, the order of value updates is preserved from the
source session to the subscriber sessions. If a session updates a
string topic with the value A1 followed by A2,
the server will
notify subscribing sessions of the updates in that order. No
guarantees are made about the order of updates across topics. For
example, if a session updates topic A with
the values A1 and A2,
and topic B with the values B1
, one subscriber might
receive A1, B1
, A2, and another might receive B1
Adding and removing topics
Sessions with appropriate security privileges can add and remove
topics. The topic path and topic specification are supplied when
adding a topic. The topic specification consists of the topic type and
a set of topic properties that allow the behavior of the topic to be
adapted to application needs. Some topic properties are specific to
the topic type. For example, the TIME_SERIES_EVENT_VALUE_TYPE
property configures the data type of the values for a time series
topic, and the SCHEMA property configures an optional schema for
All topic types support the optional REMOVAL property, which
configures an automatic removal policy. Each policy provides a set of
conditions under which the server will remove a topic. You can
configure a topic to be removed at a future time, if it has stopped
receiving updates, if it has no subscribers, or when the server
has no client sessions matching specific criteria. The criteria are expressed
in terms of session property values.
Subscribing to topics
The server maintains a real-time data model, presented through topics.
Each client selectively subscribes to a subset of the data model,
according to the needs of the application and data security
restrictions applied at the server. Topics provide a fine-grained
mapping of the logical data model, so in a typical application each
client has a unique partial view of the data model. The client library
retains the values of each subscribed topic. The server sends updates
to keep the client's view synchronized.
Client sessions subscribe to topic paths using topic selectors. The
server persists the set of topic selectors for each session, and
dynamically joins selectors against its topics to resolve
subscriptions. The dynamic join between topic selectors and topics is
unique to Diffusion
and is a powerful way to link client applications
with a changing data model. The set of topic selectors defines the
view of the data model the client is interested in. The server keeps
each session up-to-date with the available data that matches the
provided topic selectors. Let's look at how this works.
When a session subscribes with a topic selector, the server will
resolve subscriptions to all topics with paths matching the topic
selector. The session will be notified of the resolved subscriptions
and the current value of each topic that has one. The subscription
notification includes the topic specification. The server will further
notify the session of topic value changes as they occur. A session can
subscribe to a topic path for which there is no topic. If a topic is
created for the path at a later time, the server will resolve the
subscription and notify the session. For example, if a topic
weather/capitals/paris is added, subscriptions will be resolved for
all sessions that have previously subscribed using the topic selector
?weather/capitals/ . The server will notify the subscribing sessions
of the new subscriptions.
If the topic is removed, any resolved subscriptions will be removed,
and the previously subscribed sessions will be notified of the
A session can unsubscribe from paths using a topic selector.
Subscriptions will be removed for any topics matching the selector to
which the session was previously subscribed, and the server will
notify the session of the unsubscription. Like subscribe requests,
unsubscribe requests are persisted by the server. The session's
selector set is the accumulation of the subscribe and unsubscribe
events, in the order received. For example, if a session subscribes to
unsubscribes from >weather/capitals/athens,
the selector set will match all topics below weather/capitals except
for weather/capitals/athens. On the other hand, if a session first
unsubscribes from >weather/capitals/athens and then subscribes to
?weather/capitals/, the subscription will mask the more specific,
earlier unsubscription and the selector set will match all topics
The dynamic joins extend to slave and routing reference topics.
Subscriptions to reference topics are only resolved if the referenced
source topic also exists. Subscriptions to reference topics will be
removed if either the reference topic or the source topic is removed.
Fetching topic data
A session can fetch the topic specifications and current values of a
set of topics. This is a one-off operation that captures a snapshot of
the data – the session is not notified of later value updates – but is
useful for applications needing to present a static view of the
The set of topics to fetch is specified with a topic selector and can
be further constrained to allow the topic tree to be explored
How Diffusion makes efficient use of the network
Many aspects of Diffusion across different architectural layers
combine to allow very efficient delivery of application data over the
network. The performance translates directly into tangible financial
savings for Diffusion users and their customers – more application
data can be streamed using less network bandwidth. In addition,
applications can provide richer and more data-intensive views.
Diffusion uses a proprietary binary network protocol,
close attention to minimizing transport framing costs. For each
session, the server balances the batching of updates into network
operations against their timely delivery.
The fine-grained mapping of topics to the logical data model allows an
application client to select only the data items that it needs. The
server maintains the topic selectors for each session, so can
immediately subscribe them to new data items without additional
interactions. In contrast, publish-and-subscribe messaging systems
often require applications to publish the availability of a new data
item on one channel, and for interested clients to respond to this
event by individual subscribing, which is expensive to process and
introduces unnecessary delays.
Through the subscription-based approach, each client session is
synchronized with the topics it is subscribed to. Consequently, the
server only needs to inform each client of a topic's path and
specification when the subscription is resolved. Even better, it
allows changes to a topic's value to be sent as an optimal delta
A delta stream encodes a change to the value by sending only
the differences between the old value and the new value. Updates to
values frequently only affect part of the value. Consider a JSON value
– typically the structure of the value including object keys, white
space, and delimiters is unchanged between successive updates. Delta
streaming is performed automatically and is transparent to the
application. The server calculates the differences between the
previous value and the new value and sends this to the client. The
client applies the differences to its copy of the previous value to
calculate the new value. Delta streams are also used when a client
session uses an update stream to send a sequence of updates to a
topic. Again, Diffusion automatically and transparently calculates and
sends differences between successive values. The synchronized,
stateful communication used by Diffusion is much more network
efficient than the stateless communication used by messaging-based or
Topic value updates sent from the server to sessions are compressed
and decompressed by the clients. The server compresses each update
once and re-uses the result for all of the subscribers. Compression
is complementary to delta streaming and provides additional
Diffusion's conflation feature
improves the efficiency, reliability,
and timeliness of topic updates sent to slow or temporarily
disconnected sessions. The server has a queue of updates for each
session. Updates can back-up on a queue if the session is temporarily
disconnected, there is a network bottleneck, or the client application
is performing slowly. Conflation addresses the backlog by selectively
removing out-of-date topic updates. This reduces server memory
footprint and the amount of network data required to bring a session
back up to date. A conflation policy can be tuned for each topic using
the CONFLATION topic property.
Controlling access to topic data
Using Diffusion's role-based authorization system,
can be granted or denied the rights to add and remove a topic, to
subscribe using a topic selector, to view a topic value, or to update
a topic value.
Each session has a set of roles obtained through the authentication
process or set by control sessions. Each role grants a session various
security permissions. Access to topics is controlled via the topic
permissions MODIFY_TOPIC, READ_TOPIC, SELECT_TOPIC, and
MODIFY_TOPIC. Time series topics can be further controlled by the
topic permissions QUERY_OBSOLETE_TIME_SERIES_EVENTS,
EDIT_TIME_SERIES_EVENTS, and EDIT_OWN_TIME_SERIES_EVENTS, which
grant sessions additional control over the history of time series
Topic permissions are assigned to roles for a particular branch of the
topic tree. An assignment applies to all topics with paths belonging
to the branch unless overridden by a more specific assignment.
The MODIFY_TOPIC permission is required to add or remove a topic.
The UPDATE_TOPIC permission is required to update a topic value.
The READ_TOPIC permission is required to subscribe to or fetch a
topic. If a session does not have READ_TOPIC permission for a topic,
the topic will be excluded from the results of subscription or fetch
operations for the session. READ_TOPIC permissions are one factor
the server's dynamic join of topic selectors to available topics. If a
session's roles change – for example, perhaps a control session
applies the *change roles* operation to the session – the server will
reevaluate its topic selectors. The session will be subscribed to
matching topics for which it now has permission and unsubscribed from
the topics for which it no longer has permission.
The SELECT_TOPIC permission is required to use a topic selector, so
controls the parts of the topic tree from which a session can
subscribe or fetch. Given the READ_TOPIC permission controls access
to topic paths, why is this useful? The answer is that some
applications delegate subscription to a control session. A session
that has READ_TOPIC permission but not SELECT_TOPIC permission for
a particular topic path cannot subscribe directly to topics belonging
to the path. However, the session can be independently subscribed by a
control session that has the MODIFY_SESSION global permission in
addition to the appropriate SELECT_TOPIC permission.
Sometimes a topic is used to publish information to a single user, for
a user to broadcast information, or to share data between a user's
multiple sessions. In these cases, it can be unwieldy to set up lots
of specialized topic permissions for the different security principals
representing the users. An alternative is to create the topic as owned
by a particular principal, using the OWNER topic property. A topic
with the OWNER property grants full acccess to sessions
authenticated with the named principal. Other sessions continue to be
constrained by the configured topic permissions.
Premium features: persistence, replication, and fan-out
Three topic-related features are included in the separately licensed
Scale and Availability pack.
logs a server's topic data to disk. Topic
persistence allows a server to be stopped and restarted without
needing to start a separate client to re-create topics and their
values. It can provide faster time-to-recovery and is very useful
during development when servers are frequently restarted or test
data needs to be shared between developers and environments.
Topic replication mirrors the topic tree across a cluster of peer
servers. This improves system availability – the topic data can
survive the loss of individual servers – and provides a consistent
view of the data to each client session regardless of the server that
hosts the session.
Fan-out is designed for replication of topic data between different
geographies. Fan-out links can be configured to mirror selected parts
of the topic tree from a primary server or cluster of primary servers
to one or more secondary servers. The secondary servers present a
read-only view of the topic data; updates can only be made through the
primary server. Some Diffusion systems use fan-out within a data
center, to separate a primary data tier of servers from a secondary
tier of servers that host customer sessions. This design allows the
secondary tier to be scaled independently to support millions of