Build plans queued for extended duration reporting "Updating source code to latest..." inside Build activity dashboard
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Build plans have been queued for building for an extended duration reporting "Updating source code to latest..." inside the Build activity dashboard. There are several agents capable of processing the builds available but they remain queued for an excessive period.
Diagnosis
The fact that several build plans are queued and seem to be stuck during "Updating source code to latest..." doesn't necessarily mean they are waiting for Bamboo to update the caches of the repositories they are using before dispatching the build. There's another article that outlines the potential causes and fixes for this type of problem here:
The issue described in this article is slightly different and affects builds after change detection has happened and respective caches been updated despite the fact Bamboo reports "Updating source code to latest..." inside the Build activity dashboard. There's one common factor to the two scenarios that are going to be described below and is very important for diagnosing this issue:
Build plans are in fact getting dispatched and built by agents while Bamboo reports "Updating source code to latest...". So the very first step to diagnosing this issue would be to review your agent logs and see if they are building the plans that Bamboo says are in the queue under the status "Updating source code to latest...".
It's helpful to understand the basics of the Bamboo build plan workflow to understand where the issue might be when the symptoms present:
- A build plan is triggered.
- Plan changes status to queued.
- Change detection happens on the server-side. This is where it will reach out to the repository to determine if there are any changes it needs for the build.
- Plan is then added to the build queue. This is when it shows up on the Build activity dashboard.
- Server assigns an agent for it and sends an event to the agent.
- Agent receives the event and starts building.
- Agent finishes building and sends the results back to server.
The problem described in this article happens when Bamboo has to process the events/ messages sent from the agent. The fact that builds are going to the queue, getting picked up by available agents and built all the while Bamboo is reporting "Updating source code to latest..." means Bamboo is having a hard time updating the status of your builds in the database.
Diagnosis 1
Thread dumps taken while several build plans are queued and seem to be stuck during "Updating source code to latest..." contain RUNNABLE threads with following classes:
...
at com.atlassian.bamboo.user.rename.UserRenameHelper.updateUserInTable(UserRenameHelper.java:38)
at com.atlassian.bamboo.user.rename.UserRenameHelper.renameUserInBuildResultSummary(UserRenameHelper.java:82)
at com.atlassian.bamboo.user.rename.UserRenameServiceImpl.doRenameUser(UserRenameServiceImpl.java:179)
...
This suggests that a user renaming process is happening.
Diagnosis 2
Important Bamboo threads such as IndexerService and BuildTailMessageProcessingThread can be seen in thread dump spending extended periods in filesystem operations. Example:
8-BuildTailMessageProcessingThread-expensive:pool-16-thread-102
State
Runnable
Java Stack
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile. (RandomAccessFile.java:243)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193)
at org.apache.lucene.store.Directory.copy(Directory.java:185)
at org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4582)
at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535)
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616)
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2815)
- locked [0x00000003ce9fd4e8] (a java.lang.Object)
- locked [0x00000003cf2b1460] (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2970)
- locked [0x00000003cf2b1460] (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2940)
at com.atlassian.bonnie.LuceneConnection.commitAndRefreshSearcher(LuceneConnection.java:566)
at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:506)
at com.atlassian.bamboo.index.IndexerServiceImpl$8.run(IndexerServiceImpl.java:314)
A quick look at the current processes utilization on the server running Bamboo shows that there's an anti-virus software consuming a lot of resources. Here's an example from running top while McAffee On-Access Scanner is running while the problem is happening:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1019 bamboo 20 0 170052 3648 1628 R 63.0 0.0 0:00.36 top
4028 root 20 0 1133188 27388 13512 S 50.0 0.0 79:33.16 oacore
6404 root 20 0 1756680 420504 8660 S 50.0 0.6 461:45.27 OASManager
6402 root 20 0 1756680 420504 8660 R 47.8 0.6 461:23.14 OASManager
6408 root 20 0 1756680 420504 8660 S 47.8 0.6 461:33.43 OASManager
6406 root 20 0 1756680 420504 8660 S 45.7 0.6 460:56.47 OASManager
6410 root 20 0 1756680 420504 8660 S 43.5 0.6 461:46.93 OASManager
Cause
Cause 1
This is actually a bug: - BAM-20993Getting issue details... STATUS . The user renaming process can be quite extensive and time consuming depending on the number of records that need to be updated inside the database. This can affect Bamboo's ability to keep up with reading/ writing the status/ results of all builds.
Cause 2
The is caused by the anti-virus which is likely intercepting/ blocking read/ open/ write operations in lucene indexing (for build results and status) and/or ActiveMQ threads. The communication and transfer of data between the Bamboo server and agents is done through the Apache ActiveMQ (AMQ). In Bamboo, AMQ is configured as a persistent queue, meaning that messages that are sent are written to disk in the <Bamboo server home directory>/jms-store directory before they get to the database.
If using McAffee On-Access Scanner the cause might be (McAffee) Slow performance with Java-based applications.
Solution
Solution 1
There's no immediate solution to this issue. If the user renaming process is running you must wait until the process finishes and be careful to avoid renaming a large batch of users at once while - BAM-20993Getting issue details... STATUS hasn't been fixed.
Solution 2
There are a few options to consider when it comes to anti-virus softwares:
- Stop the anti-virus software.
- Configure the anti-virus software to not scan the Bamboo server home directory and Bamboo server installation directory.
- If using McAfee anti-virus, ensure that the Bamboo Java process is specified as a low risk process in McAfee's On-Access Scanner settings as recommended in the following knowledge base article: (McAffee) Slow performance with Java-based applications.