Jira Software Support

Overview

The cache replication in Jira Data Center 7.9 and later is asynchronous, which means that cache modifications aren’t replicated to other nodes immediately after they occur, but are instead added to local queues. They are then replicated in the background based on their order in the queue.

Before we can queue cache modifications, we need to create local queues on each of your nodes. There are separate queues created for each node, so that modifications are properly grouped and ordered.

Queues are created automatically. Whenever you add a node to the cluster, the remaining nodes detect it and create a new queue on their file system. Nodes retrieve information about the whole cluster from the database by using the following query:

select node_id, node_state from clusternode

node_id	node_state
node1	ACTIVE
node2	ACTIVE
node3	OFFLINE

When a node is unable to deliver cache replication messages to another node the local queues will grow. This health check detects when the number of undelivered cache replication messages exceeds some limit:

WARN when the size of the queue >= 1000
ERROR when the size of the queue >= 100_000

Understanding the Results

Icon	Result	What this means
	The health check passed successfully.	The health check did not discover any problems with cache replication.
	Number of undelivered cache replications from node: <`node-name>` to node: <`node-name>` reached <`number-of-undelivered-cache-replications>`	The number of undelivered cache replications between two nodes has exceeded the warning threshold: 1000.
	Number of undelivered cache replications from node: <`node-name>` to node: <`node-name>` reached <`number-of-undelivered-cache-replications>`	The number of undelivered cache replications between two nodes has exceeded the error threshold: 100_000.

Troubleshooting

Stale node

If one of the nodes is a stale node but is still defined in the database in clusternode table and in state ACTIVE, please remove this node from the database or change the state to OFFLINE (see JRASERVER-42916 - Getting issue details... STATUS ). This change will be automatically picked up by all running nodes. When a node is removed from the cluster all nodes should stop delivering cache replication messages to this node. Queue files for this node are left and may be not empty. Those files can be safely manually deleted by the admin.

Connection issues

Node has a problem connecting to <node-name>. Please check the network connectivity.

Cache replication messages are serialized before being persisted in the queue file and deserialized when the sending threads read the message from the queue. Standard Java Serialization is used, which is the same mechanism which is used "later" when sending this message over RMI.

With asynchronous caches we have also introduced caching of CachePeer - the object representing the remote cache. On a given node instance a single CachePeer is created representing a remote cache (i.e. a cache on a different node). It is re-used until we have any exception on any CachePeer operation or on any node configuration change.

Replication through RMI in EhCache has two phases:

First phase it to lookup CachePeer in remote RMI registry
Second phase is to do replication through CachePeer

In the first phase, the connection is done using the node attributes (IP, cacheListenerPort), as defined in clusternode table.

The connection details done in second phase are defined in the stub CachePeer set by the remote node. Note that this is a different port and may be a different IP/hostname then in the first phase.

Other

Following configuration options may be useful:

cluster.properties file:
- ehcache.listener.socketTimeoutMillis, default: 5sec
- ehcache.object.port
Java system properties:
- java.rmi.server.hostname, eg -Djava.rmi.server.hostname=127.0.0.1

Links

Jira Data Center cache replication

Monitoring the cache replication

Jira Software Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

HealthCheck: Asynchronous Cache Replication Queues

Still need help?

Overview

Understanding the Results

Troubleshooting

Stale node

Connection issues

Other

Links

Page

Viewport

Confluence

HealthCheck: Asynchronous Cache Replication Queues

Related content

Still need help?

Overview

Understanding the Results

Troubleshooting

Stale node

Connection issues

Other

Links

Related content