One or more Bitbucket Data Center nodes does not start successfully
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
After restarting one of the Bitbucket server nodes, the application does not come up for that node.
Environment
- Bitbucket Data Center
- Multiple cluster nodes are in use
Diagnosis
There are no messages being added to the application log file from the missing node.
Checking the logs from an up and running node, we see the following:
2021-04-15 15:02:52,826 WARN [hz.hazelcast.cached.thread-15] c.h.n.t.TcpIpConnectionErrorHandler [10.150.90.150]:5701 [stash-cluster] [3.11.1] Removing connection to endpoint [10.150.90.149]:5701 Cause => java.net.SocketException {Connection refused to address /10.150.90.149:5701}, Error-Count: 5
2021-04-15 15:02:52,837 WARN [hz.hazelcast.cached.thread-15] c.h.i.cluster.impl.MembershipManager [10.150.90.150]:5701 [stash-cluster] [3.11.1] Member [10.150.90.149]:5701 - 25831d29-2a10-42bd-94d8-3f778di8d645 is suspected to be dead for reason: No connection
2021-04-15 15:02:52,925 INFO [hz.hazelcast.event-3] c.a.s.i.c.HazelcastClusterService Node '/10.150.90.149:5701 (stash1)' was REMOVED from the cluster. Updated cluster:
[/10.150.90.150:5701 master this name='stash2' uuid='208d5fac-5ff4-452b-84ce-1f39c32e5b3l' vm-id='3e936bd8-dc81-4218-82c2-2061fe50b41f']
The problematic node 10.150.90.149 was removed from the cluster as seen by the message above. An important part of the message is Connection refused to address /10.150.90.149:5701}, Error-Count: 5. Hazelcast performs a heartbeat check 5 times and if there is no response, the unresponsive node is ejected from the cluster.
Cause
Usually, this message is a consequence of one of the following behaviors:
- Network related issues.
- Automation tools are not properly configured to run Bitbucket.
Solution
Resolution 1
Check the network connectivity between nodes, specifically on port 5701. You can use telnet or ping to check if the nodes are communicating with each other.
Resolution 2
Check the automaton tools configurations used to start Bitbucket.