Collaborative Editing fails on clustered Confluence Data Center - can't establish a persistent WebSocket connection

Still need help?

The Atlassian Community is here for you.

Ask the community

 

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

After setting up a Clustered Confluence Data Center to run behind an Amazon Web Service Elastic Load Balancer, Collaborative Editing isn't working.

The error message displayed is The editor didn't load this time, and you already reviewed the document Confluence throws "The editor didn't load this time" error when trying to edit a page.

Environment

Clustered Confluence Data Center using Self-Managed Synchrony

Load Balancer

Diagnosis

  1. When capturing a HAR file, while editing a document with Collaborative Editing enabled, it's possible to see the WebSocket connection upgrade (HTTP 101 - Switching protocols). The WebSocket connection is a persistent connection and should be active during all the time the document is being edited. However, when we can see multiple requests for the Websocket upgrade, all taking a long time to complete, as for example:

  2. In Synchrony logs of Node 1, we can see a ConnectTimeoutException happening when this node tries to communicate with Node 2 using Port 25500:

    atlassian-synchrony.log for Node 1
    2020-11-17 15:41:07,626 WARN [async-dispatch-12] [synchrony.event-bus] error creating topic connection {:throwable #error {
    2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] 	 :cause "connection timed out: /<NODE2_IP_ADDRESS>:25500"
    2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] 	 :via
    2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] 	 [{:type io.netty.channel.ConnectTimeoutException
    2020-11-17 15:41:07,626 DEBUG [2571:StdOutHandler [/data/atlassian/confluence/jre/bin/java]] 	   :message "connection timed out: /<NODE2_IP_ADDRESS>:25500"
  3. On Node 2, we can see that Node 1 was reported dead, and the Cluster Membership is composed of Node 2 only:

    atlassian-synchrony.log for Node 1
    2020-11-17 15:41:34,724 WARN [hz._hzInstance_1_cluster-name-Synchrony.cached.thread-3] [internal.cluster.impl.MembershipManager] [<NODE2_IP_ADDRESS>]:5701 [cluster-name-Synchrony] [3.11.4] Member [<NODE1_IP_ADDRESS]:5701 - 610548f6-f067-4a9d-9a73-7e5c5a9b8491 is suspected to be dead for reason: No connection
  4. When trying to test the WebSocket upgrade through curl, using Confluence Base URL, we receive an empty reply from the server instead of establishing a WebSocket connection:

    WebSocket upgrade test
    $ curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" -H "Sec-WebSocket-Key: 33xyqDvzAXTYgsjjbaYD5A==" --header "Sec-WebSocket-Version: 13" https://<CONFLUENCE_BASE_URL>/synchrony/v1/bayeux-sync1
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 101 Switching Protocols
    Date: Thu, 19 Nov 2020 02:05:57 GMT
    Connection: upgrade
    P3P: CP="This is not a P3P policy! See http://www.atlassian.com/company/privacy for more info."
    upgrade: websocket
    sec-websocket-accept: RV68WHXjNM0H+IFE5/W1ioLmMHI=
    
      0     0    0     0    0     0      0      0 --:--:--  0:01:00 --:--:--     0
    curl: (52) Empty reply from server

Cause

The inter-node communication in Synchrony isn't working properly using the ports mentioned on the log files.

Solution

  • Review the setup of all ports used by Synchrony on nodes of a Confluence Data Center setup. Ensure that:
    • All nodes are listening on ports 5701 (Hazelcast for Synchrony) and 25500 (Cluster base port for Synchrony).
    • You can establish a connection from one node to the other using these ports


Last modified on Dec 1, 2020

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.