Some Bitbucket nodes are taking longer to start while others start almost instantaneously

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Bitbucket was stuck on start-up page and the logs didn't "go forward":


Environment

Bitbucket DC 7.14.1


Diagnosis

The node managed to start after ~40min. The second node had the same behavior for ~50 minutes and the third node started almost instantly without the behavior the first and the second node experienced.


Logs Node 1:

Checking the logs from the first node, the application took ~40 minutes to keep logging after the Started SSH server successfully entry:

2021-05-21 13:06:03,680 INFO  [spring-startup]  c.a.b.internal.ssh.server.SshServer Starting SSH server on port 7999...
2021-05-21 13:06:03,804 INFO  [spring-startup]  c.a.b.internal.ssh.server.SshServer Started SSH server successfully.
2021-05-21 13:45:23,431 INFO  [spring-startup]  c.a.b.i.s.c.j.c.HealthCheckRunner New health check registered: SearchIndexCheck
2021-05-21 13:45:23,432 INFO  [spring-startup]  c.a.b.i.s.c.c.DefaultClusterJobManager Registering job for ElasticsearchSynchronizeJob

The access logs started to written data also at 13:45 (the time the node got effectively started).


Logs Node 2:

~46 minutes difference, similar to the 1st node seen previously:

2021-05-21 12:19:08,709 INFO  [spring-startup]  c.a.b.internal.ssh.server.SshServer Started SSH server successfully.
2021-05-21 13:05:11,047 INFO  [hz.hazelcast.event-3]  c.a.s.i.c.HazelcastClusterService Node '/10.10.10.15:5701' was ADDED to the cluster. Updated cluster: 
  [/10.10.10.16:5701 master this uuid='e7287c8c-ce03-4383-848e-3a76d34d9781' vm-id='s357a960-ce4f-4321-bafd-42c4e535d172'], 
  [/10.10.10.15:5701 uuid='5366655d-099d-48b1-b13f-e5bce56a70cd' vm-id='b4ec4tcl-1c30-4faf-9f42-5b4eefe6fab9']

followed by:

2021-05-21 13:10:52,982 ERROR [active-objects-init-compatibility-tenant-0]  net.java.ao.sql Exception executing SQL update <CREATE INDEX "index_ao_c77861_aud96775159" ON "AO_C77861_AUDIT_ENTITY"("RESOURCE_ID_5","RESOURCE_TYPE_5","ENTITY_TIMESTAMP")>
org.postgresql.util.PSQLException: ERROR: could not extend file "base/11000/9006030": No space left on device
  Hint: Check free disk space.
  at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
...


 It's important to get the logs from all nodes, since not all of them may point to a lack of disk space. 

Cause

The root cause of the error is the disk space allocated for the database, or the disk quota for the user owning the database has been exceeded. Therefore any additional data (such as rescoping for an update of a pull request) cannot be written on the database tables. The DB disk space is the reason behind the slow start.


Solution

The DB disk space needs to be increased/adjusted accordingly.  Check the disk space or the disk quota for the user associated with the Bitbucket database at the time the nodes were started.

Last modified on Sep 14, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.