Confluence throws 500 error due to lock timeout in VCacheLock

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Sometimes Confluence may respond with 500x error pages, have a general performance issue, or even a full outage, due to the lock timeout in VCacheLock which is used by I18NBean. 

Either of the following appears in the atlassian-confluence.log:

WARN [http-nio-8090-exec-143] [internal.core.service.VCacheLock] lockWithTimeout Timed out waiting for lock on cache: com.atlassian.confluence.util.i18n.I18NBeanFactory.by.locale

or:

ERROR [http-nio-8090-exec-1] [xhtml.view.macro.ViewMacroMarshaller] handleMacroExecutionException Error rendering macro: content-column
 -- url: /display/SPACEKEY/Page+Name | page: 123456789 | traceId: 7a94e4c4b2be88a0 | userName: user.name@example.com | referer: https://confluence.example.com/display/SPACEKEY/Page+Name | action: viewpage
java.lang.StackOverflowError
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.PrintWriter.<init>(PrintWriter.java:116)
        at java.io.PrintWriter.<init>(PrintWriter.java:100)
        at com.atlassian.confluence.util.PatternLayoutWithStackTrace.appendStackTrace(PatternLayoutWithStackTrace.java:43)
        at com.atlassian.confluence.util.PatternLayoutWithStackTrace.format(PatternLayoutWithStackTrace.java:30)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
        at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
        at com.atlassian.confluence.logging.ConfluenceHomeLogAppender.append(ConfluenceHomeLogAppender.java:147)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.error(Category.java:322)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getWorkflows(DefaultWorkflowAccessor.java:225)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getWorkflows(DefaultWorkflowAccessor.java:303)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getPageWorkflows(DefaultWorkflowAccessor.java:290)
        at com.comalatech.confluence.ipc.api.DefaultStateService.getCurrentState(DefaultStateService.java:118)
        at com.comalatech.confluence.ipc.api.DefaultStateService.getCurrentState(DefaultStateService.java:53)
        at com.brikit.comalaworkflowsservice.Comala.getPublishedVersion(Comala.java:74)
        at com.brikit.core.confluence.Confluence.getPublishedComalaWorkflowsVersion(Confluence.java:3137)
        at com.brikit.core.confluence.Confluence.getVersionForUserRequest(Confluence.java:3906)
        at com.brikit.core.confluence.Confluence.getBodyAsString(Confluence.java:1717)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:160)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:151)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:144)
        at com.brikit.themepress.model.PageWrapper.contentMacros(PageWrapper.java:623)
        at com.brikit.themepress.model.PageWrapper.layers(PageWrapper.java:1348)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:995)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        .... Infinite recursion here


Symptoms

The system becomes unresponsive. Thread dumps show HTTP threads hung waiting for the lock. Thread dumps do not show what threads owns the lock. After a heap dump investigation, it will be clear that lock owner is either dead or spending time in an unrelated piece of code.

The lock has been 'leaked' somehow. Threads are blocking on it, but it is never going to be unlocked. Once the system enters this state, it can only be healed by a restart.

Diagnosis

Environment

Diagnostic Steps

  • Capture Thread Dumps while the issue is occurring

  • First, check the status of all thread states across the thread dumps, using the following: 

    $ grep -A1 '-exec-' confluence_threads.* | grep State | sort | uniq -c


    • If the result looks like this, with all threads in a WAITING (parking) state, you may have this issue and should follow the second grep below: 

      $ grep -A1 '-exec-' confluence_threads.* | grep State | sort | uniq -c
       300 confluence_threads.1535641226.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641241.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641255.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641270.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641285.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641301.txt-   java.lang.Thread.State: WAITING (parking)
  • Second, grep each thread dump using the following method to look for the lock: 

    $ cat thread_dump.txt  | egrep "(MemoryStore.put|MemoryStore.get)" -B 10


    • If the results match this example, you have likely run into this issue: 

      $ cat confluence_threads.1535641285.txt  | egrep "(MemoryStore.put|MemoryStore.get)" -B 1
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00007f1e048cd408> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.put(SelectableConcurrentHashMap.java:641)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.put(SelectableConcurrentHashMap.java:383)
      	at net.sf.ehcache.store.MemoryStore.put(MemoryStore.java:283)
      --
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00007f1e048cd408> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.get(SelectableConcurrentHashMap.java:721)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.get(SelectableConcurrentHashMap.java:324)
      	at net.sf.ehcache.store.MemoryStore.get(MemoryStore.java:344)
  • For full confirmation you have run into this issue, contact Atlassian Support

Cause

This happens when thread-X and thread-Y both lock the cache, then other threads try to lock the cache also and must for 30 seconds, then an exception occurs.

Multiple third party vendors have been identified with plugins that can run into this issue. Below are some known vendors which have fixes to close this issue:

The Brikit Themepress plugin 1.x through 2.2.x uses code which causes this performance issue to occur.

CelesteCS Math Plugin for Confluence prior to version 21.1.0 can run into this issue.

Workaround

Capture thread dumps and restart Confluence to recover the system. 

Resolution

Brikit's bugs linked above note the issue should be resolved in Themepress 2.3.x.

Upgrading Themepress to the fixed version should prevent the issue from occurring again.



DescriptionSometimes Confluence starts responding 500x error pages due to the lock timeout in VCacheLock which is used by I18NBean. It happens when threads X and Y lock this cache, and then other threads try to lock it also, wait for 30 seconds and then throw an exception.
ProductConfluence Server, Confluence Data Center

Last modified on Feb 25, 2019

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.