Jira node fails to start due to cluster lock in the active objects
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
This only affects Jira Data Center. Node restart causes the node to be stuck during application startup due to another node holding a lock in the active objects.
Symptoms
- Node startup does not trigger Severe errors.
- It does not add the node as ACTIVE on the Cluster.
- It does not finish to start the plugins.
The following stack trace can be found in the thread dumps captured on the restarted node for the localhost-startStop thread:
"localhost-startStop-1" #24 daemon prio=5 os_prio=0 cpu=56960.05ms elapsed=795.63s tid=0x00007fe820001800 nid=0x1909 waiting on condition [0x00007fe83a68b000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method) - parking to wait for <0x000000073ba00030> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/Unknown Source) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/Unknown Source) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/Unknown Source) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.13/Unknown Source) at java.util.concurrent.CompletableFuture.get(java.base@11.0.13/Unknown Source) at io.atlassian.util.concurrent.Promises$OfStage.claim(Promises.java:280) at com.atlassian.activeobjects.osgi.TenantAwareActiveObjects.flushAll(TenantAwareActiveObjects.java:247) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.13/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) at java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source) at org.joor.Reflect.on(Reflect.java:673) at org.joor.Reflect.call(Reflect.java:379) at org.joor.Reflect.call(Reflect.java:332) at com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.invokeAo(DatabaseSchemaCreationImpl.java:86) at com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$2360/0x0000000803c4e440.apply(Unknown Source) at io.atlassian.fugue.Effect.accept(Effect.java:43) at io.atlassian.fugue.Option$Some.forEach(Option.java:468) at io.atlassian.fugue.Option$Some.foreach(Option.java:464) at com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.lambda$primeImpl$0(DatabaseSchemaCreationImpl.java:66) at com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl$$Lambda$1532/0x0000000802169040.apply(Unknown Source) at com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.lambda$get$0(MemoizingResettingReference.java:59) at com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$$Lambda$2359/0x0000000803c4e040.get(Unknown Source) at com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$SmarterMemoizingSupplier.get(MemoizingResettingReference.java:150) - locked <0x000000073ba002b0> (a com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference$SmarterMemoizingSupplier) at com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.safelyGetT(MemoizingResettingReference.java:71) at com.atlassian.pocketknife.internal.querydsl.util.MemoizingResettingReference.get(MemoizingResettingReference.java:63) at com.atlassian.pocketknife.internal.querydsl.schema.DatabaseSchemaCreationImpl.prime(DatabaseSchemaCreationImpl.java:60) at com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.execute(DatabaseAccessorImpl.java:62) at com.atlassian.pocketknife.internal.querydsl.DatabaseAccessorImpl.runInTransaction(DatabaseAccessorImpl.java:43) at com.atlassian.ratelimiting.db.internal.dao.QDSLSystemRateLimitingSettingsDao.initializeDbIfNeeded(QDSLSystemRateLimitingSettingsDao.java:39) at com.atlassian.ratelimiting.internal.configuration.DefaultSystemPropertiesService.initializeData(DefaultSystemPropertiesService.java:64) at com.atlassian.ratelimiting.internal.settings.RateLimitModificationSettingsService.onPluginEnabled(RateLimitModificationSettingsService.java:96) at jdk.internal.reflect.GeneratedMethodAccessor331.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/Unknown Source) at java.lang.reflect.Method.invoke(java.base@11.0.13/Unknown Source) at com.atlassian.event.internal.SingleParameterMethodListenerInvoker.invoke(SingleParameterMethodListenerInvoker.java:42) at com.atlassian.event.internal.ComparableListenerInvoker.invoke(ComparableListenerInvoker.java:48) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.lambda$null$0(AsynchronousAbleEventDispatcher.java:37) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$707/0x0000000800a60c40.run(Unknown Source) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher$$Lambda$180/0x0000000800376440.execute(Unknown Source) at com.atlassian.event.internal.AsynchronousAbleEventDispatcher.dispatch(AsynchronousAbleEventDispatcher.java:85) at com.atlassian.event.internal.EventPublisherImpl.publish(EventPublisherImpl.java:114) at com.atlassian.event.internal.LockFreeEventPublisher.publish(LockFreeEventPublisher.java:40) at com.atlassian.plugin.event.impl.DefaultPluginEventManager.broadcast(DefaultPluginEventManager.java:90) at com.atlassian.plugin.manager.DefaultPluginManager.broadcastIgnoreError(DefaultPluginManager.java:1972) at com.atlassian.plugin.manager.DefaultPluginManager.lambda$broadcastPluginEnabled$41(DefaultPluginManager.java:1782) at com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$2192/0x0000000803982040.run(Unknown Source) at com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) at com.atlassian.plugin.manager.DefaultPluginManager.broadcastPluginEnabled(DefaultPluginManager.java:1781) at com.atlassian.plugin.manager.DefaultPluginManager.lambda$enableDependentPlugins$24(DefaultPluginManager.java:1260) at com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1018/0x0000000800dd2440.run(Unknown Source) at com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) at com.atlassian.plugin.manager.DefaultPluginManager.enableDependentPlugins(DefaultPluginManager.java:1229) at com.atlassian.plugin.manager.DefaultPluginManager.lambda$addPlugins$22(DefaultPluginManager.java:1214) at com.atlassian.plugin.manager.DefaultPluginManager$$Lambda$1014/0x0000000800dd3440.run(Unknown Source) at com.atlassian.plugin.manager.PluginTransactionContext.wrap(PluginTransactionContext.java:63) at com.atlassian.plugin.manager.DefaultPluginManager.addPlugins(DefaultPluginManager.java:1114)
The following stack trace can be found in the thread dumps captured on the restarted node for the Active-Objects init thread.
"active-objects-init-JiraTenantImpl{id='system'}-0" #278 prio=5 os_prio=0 cpu=3762.36ms elapsed=722.21s tid=0x00007fe8172d9000 nid=0x1ade waiting on condition [0x00007fe55e4d9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base@11.0.13/Native Method)
at com.atlassian.beehive.db.DatabaseClusterLock.sleep(DatabaseClusterLock.java:621)
at com.atlassian.beehive.db.DatabaseClusterLock.tryLockWaitWithTimeout(DatabaseClusterLock.java:472)
at com.atlassian.beehive.db.DatabaseClusterLock.tryLock(DatabaseClusterLock.java:453)
at com.atlassian.activeobjects.internal.AbstractActiveObjectsFactory.create(AbstractActiveObjectsFactory.java:57)
at com.atlassian.activeobjects.internal.DelegatingActiveObjectsFactory.create(DelegatingActiveObjectsFactory.java:32)
at com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:91)
at com.atlassian.activeobjects.osgi.TenantAwareActiveObjects$1$1$1.call(TenantAwareActiveObjects.java:86)
at com.atlassian.sal.core.executor.ThreadLocalDelegateCallable.call(ThreadLocalDelegateCallable.java:38)
at java.util.concurrent.FutureTask.run(java.base@11.0.13/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/Unknown Source)
at java.lang.Thread.run(java.base@11.0.13/Unknown Source)
Cause
One of the nodes within the cluster is holding a cluster-wide lock. Reviewing this particular node we can observe that it might have issues that lead to application unresponsiveness or it might have had Database Connection issues.
As a result, it's unable to release the cluster-wide lock.
Resolution
Run the following SQL query to determine which node is holding the cluster-wide lock.
SELECT * FROM clusterlockstatus WHERE locked_by_node IS NOT NULL;
It should return a similar result to the following. The following example states that Node 3 is the one holding the cluster-wide lock.
ID LOCK_NAME LOCKED_BY_NODE UPDATE_TIME 1747827 ao-plugin.upgrade.com.atlassian.jira.migration.jira-migration-plugin node2 1660313500762 - Using a Unix timestamp converter to convert UPDATE_TIME, you can find when the lock was done and check for errors in the log files or Database connection problems that might have caused the lock to remain.
Workaround 1: (preferred method)
- Restart the node holding the lock so that the cluster-wide lock can be released.
- Once the cluster-wide lock is released and the application is back in action, starts the node that had previously failed to start.
Workaround 2
Database updates are not supported and can result in integrity issues.
1. Remove the lock:
delete from clusterlockstatus where id = 1747827;
2. Once the cluster-wide lock is released and the application is back in action, start the node that had previously failed to start.
Relevant Improvements and bugs
- Jira Data Center Functionalities Loss Due to Cluster Wide Lock
- - JRASERVER-74298Getting issue details... STATUS