View Source

Introduction

A mechanism was added in Confluence 2.3 and above to ensure database consistency when running multiple cluster nodes against the same database. This is called the cluster safety mechanism, and is designed to ensure that your wiki cannot become inconsistent because updates by one user are not visible to another. A failure of this mechanism is a fatal error in Confluence and is called cluster panic.

Because the cluster safety mechanism helps prevents data inconsistency whenever any two copies of Confluence running against the same database, it is enabled in all instances of Confluence, not just clusters.

How cluster safety works

A scheduled task, ClusterSafetyJob, runs every 30 seconds in Confluence. In a cluster, this job is run only on one of the nodes. The scheduled task operates on a safety number – a randomly generated number that is stored both in the database and in the distributed cache used across a cluster. It does the following:

Generate a new random number
Compare the existing safety numbers, if there is already a safety number in both the database and the cache.
If the numbers differ, publish a ClusterPanicEvent. Currently in Confluence, this causes the following to happen:
- disable all access to the application
- disable all scheduled tasks
- update the database safety number to a new value, which will cause all nodes accessing the database to fail.
If the numbers are the same or aren't set yet, update the safety numbers:
- set the safety number in the database to the new random number
- set the safety number in the cache to the new random number.

How to fix it

Technical details

The cluster safety number in the database is stored in the CLUSTERSAFETY table. This table has just one row: the current safety number.