A Confluence node cannot start after restarting with error "Caused by: com.hazelcast.config.ConfigurationException"

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

A Confluence node cannot start after restarting with error "Caused by: com.hazelcast.config.ConfigurationException"

Environment

Confluence Data Center

Diagnosis

The atlassian-confluence.log will show the above error, in addition to messages such as 'Cannot add a dynamic configuration' related to 'MapConfig', as below:

Caused by: com.hazelcast.config.ConfigurationException: Cannot add a dynamic configuration 'MapConfig{name='atlassian-cache.Cache.com.atlassian.confluence.user.ConfluenceUserPropertySetFactory.propertysets', inMemoryFormat=BINARY', backupCount=0, asyncBackupCount=0, timeToLiveSeconds=3600, maxIdleSeconds=3600, evictionPolicy='LFU', mapEvictionPolicy='null', evictionPercentage=25, minEvictionCheckMillis=100, maxSizeConfig=MaxSizeConfig{maxSizePolicy='PER_NODE', size=40000}, readBackupData=false, hotRestart=HotRestartConfig{enabled=false, fsync=false}, nearCacheConfig=NearCacheConfig{name=default, inMemoryFormat=OBJECT, invalidateOnChange=true, timeToLiveSeconds=3600, maxIdleSeconds=3600, maxSize=40000, evictionPolicy='LFU', evictionConfig=EvictionConfig{size=40000, maxSizePolicy=ENTRY_COUNT, evictionPolicy=LFU, comparatorClassName=null, comparator=null}, cacheLocalEntries=true, localUpdatePolicy=INVALIDATE, preloaderConfig=NearCachePreloaderConfig{enabled=false, directory=, storeInitialDelaySeconds=600, storeIntervalSeconds=600}}, mapStoreConfig=MapStoreConfig{enabled=false, className='null', factoryClassName='null', writeDelaySeconds=0, writeBatchSize=1, implementation=null, factoryImplementation=null, properties={}, initialLoadMode=LAZY, writeCoalescing=true}, mergePolicyConfig=MergePolicyConfig{policy='com.atlassian.confluence.cluster.hazelcast.AlwaysNullMapMergePolicy', batchSize=100}, wanReplicationRef=null, entryListenerConfigs=[], mapIndexConfigs=[], mapAttributeConfigs=[], quorumName=null, queryCacheConfigs=[], cacheDeserializedValues=INDEX_ONLY}' as there is already a conflicting configuration


Cause

Our investigation into MapConfig shows that this is a cluster cache issue. MapConfig is part of the Hazelcast library that Confluence uses for clustering, and there are two contributors for this problem:

Cause #1 : CONFSERVER-60142Changing distributed cache settings prevents Confluence cluster node restart due to a Hazelcast exception

or

Cause #2 : More than one node in the cluster was started simultaneously and tried to join the cluster at the same time. This simultaneous join could have potentially corrupted the EhCache and then prevented a node from starting.


Solution

If you've recently made changes to distributed cache sizes (Confluence Administration >> General Configuration >> Cache Management >> Show advanced view), cause #1 above is likely the issue, so changing the cache settings back to the previous values should allow nodes to restart.

However, if nodes have recently been started at the same time, stopping all nodes in the cluster before restarting one node at a time should fix the problem.


Regardless of the cause, a full shutdown and subsequent restart will fix the problem, as the memory cache will be fully destroyed once the last node leaves the cluster. 

  1. Stop Confluence on all nodes to bring the whole cluster down.
  2. Confirm that Confluence has fully stopped (check that the Java process for Confluence has exited) eg. 'ps -ef | grep -i confluence'
  3. Once you've confirmed that Confluence has fully stopped on all nodes, restart only one node. 
    (info) This can be one of the nodes that's currently considered 'good' or it can be this particular problematic node. This will reset the whole cluster cache. 
  4. Check that the first node is fully up. 
    (info) You can do this by directing a browser directly to the node that has been started in step 3 and verifying that it is responsive. 
  5. Once the first node is fully up, start the remaining nodes, one by one. Each node startup will typically take a few minutes.   Confirm that each node is fully up before starting the next node. 
    (info) You can confirm that a node has finished starting up by either directing a browser to the node that has just been started and checking that it's responsive, or via the UI under Confluence administration >> General Configuration >> Clustering

Last modified on Nov 24, 2020

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.