Collaborative Editing fails on clustered Confluence Data Center - can't establish a persistent WebSocket connection
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
After setting up a Clustered Confluence Data Center to run behind an Amazon Web Service Elastic Load Balancer, Collaborative Editing isn't working.
The error message displayed is The editor didn't load this time, and you already reviewed the document Confluence throws "The editor didn't load this time" error when trying to edit a page.
Environment
Clustered Confluence Data Center using Self-Managed Synchrony
Load Balancer
Diagnosis
- When capturing a HAR file, while editing a document with Collaborative Editing enabled, it's possible to see the WebSocket connection upgrade (HTTP 101 - Switching protocols). The WebSocket connection is a persistent connection and should be active during all the time the document is being edited. However, when we can see multiple requests for the Websocket upgrade, all taking a long time to complete, as for example:
In Synchrony logs of Node 1, we can see a ConnectTimeoutException happening when this node tries to communicate with Node 2 using Port 25500:
atlassian-synchrony.log for Node 12020-11-17 15:41:07,626 WARN [async-dispatch-12] [synchrony.event-bus] error creating topic connection {:throwable #error { 2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] :cause "connection timed out: /<NODE2_IP_ADDRESS>:25500" 2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] :via 2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] [{:type io.netty.channel.ConnectTimeoutException 2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] :message "connection timed out: /<NODE2_IP_ADDRESS>:25500"
On Node 2, we can see that Node 1 was reported dead, and the Cluster Membership is composed of Node 2 only:
atlassian-synchrony.log for Node 12020-11-17 15:41:34,724 WARN [hz._hzInstance_1_cluster-name-Synchrony.cached.thread-3] [internal.cluster.impl.MembershipManager] [<NODE2_IP_ADDRESS>]:5701 [cluster-name-Synchrony] [3.11.4] Member [<NODE1_IP_ADDRESS]:5701 - 610548f6-f067-4a9d-9a73-7e5c5a9b8491 is suspected to be dead for reason: No connection
When trying to test the WebSocket upgrade through curl, using Confluence Base URL, we receive an empty reply from the server instead of establishing a WebSocket connection:
WebSocket upgrade test$ curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" -H "Sec-WebSocket-Key: 33xyqDvzAXTYgsjjbaYD5A==" --header "Sec-WebSocket-Version: 13" https://<CONFLUENCE_BASE_URL>/synchrony/v1/bayeux-sync1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0HTTP/1.1 101 Switching Protocols Date: Thu, 19 Nov 2020 02:05:57 GMT Connection: upgrade P3P: CP="This is not a P3P policy! See http://www.atlassian.com/company/privacy for more info." upgrade: websocket sec-websocket-accept: RV68WHXjNM0H+IFE5/W1ioLmMHI= 0 0 0 0 0 0 0 0 --:--:-- 0:01:00 --:--:-- 0 curl: (52) Empty reply from server
Cause
The inter-node communication in Synchrony isn't working properly using the ports mentioned on the log files.
Solution
- Review the setup of all ports used by Synchrony on nodes of a Confluence Data Center setup. Ensure that:
- All nodes are listening on ports 5701 (Hazelcast for Synchrony) and 25500 (Cluster base port for Synchrony).
- You can establish a connection from one node to the other using these ports