On this page:
SymptomsBelow is a list of potential problems with a Confluence cluster, and their likely solutions. The solutions are listed below.
Confluence cluster debugging toolsThere is an umbrella issue opened for all cluster debugging tools here It includes the tools listed below. Multicast
The multicast address and port used by Confluence can be found on the Cluster Administration page, or in confluence.cfg.xml in the Confluence home directory.
Confluence uses a hashing algorithm to take the inputted name during setup and it is then turned into a multicast address stored in the config file. Thus, once the initial setup is completed, Confluence will use the address this is the reason why user can change the address if needed, without actually changing the name. Consequently the additional nodes using the same multicast address specified in the config file are able to join the cluster. Each node has a multicast address configured in the confluence-cfg.xml file name="confluence.cluster.address">xxx.xx.xxx.xxx</property> A warning message is displayed when an user changes the address from the one that Confluence has generated by the hashing of the name. There is no way of eliminating the message any other way other than by returning the address to the one that matches the cluster name. Purpose of the warning message is to remind the user that the address has been changed - as it is not the hashed version any longer - consequently the node can not join the cluster just by using the name. It is also necessary to provide the correct address as well. Mapping interface to IP address.To ensure that the interface name is mapped correctly, the following tool can be used. It shows the mapping of the interface name to the IP address. C:\>java -jar list-interfaces.jar interfaces.size() = 4 networkInterface[0] = name:lo (MS TCP Loopback interface) index: 1 addresses: /127.0.0.1; networkInterface[1] = name:eth0 (VMware Virtual Ethernet Adapter for VMnet8) index: 2 addresses: /192.168.133.1; networkInterface[2] = name:eth1 (VMware Virtual Ethernet Adapter for VMnet1) index: 3 addresses: /192.168.68.1; networkInterface[3] = name:eth2 (Broadcom NetXtreme 57xx Gigabit Controller - Packet Scheduler Miniport) index: 4 addresses: /192.168.0.101; Debugging toolsListed below are some debugging tools that help determine what the status of the multicast traffic is:
Add multicast routeMulticast networking requirements vary across operating systems. Some operating systems require little configuration, while some require the multicast address to be explicitly added to a network interface before Confluence can use it. If the Multicast Test tool shows that multicast traffic can't be sent or received correctly, adding a route for multicast traffic on the correct interface will often fix the problem. The example below is for a Ubuntu Linux system: route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0 To support multiple applications using multicast on different interfaces, you may need to specify a route specific to the Confluence multicast address. Check firewallEnsure your firewall allows UDP traffic on the multicast address and port used by Confluence. Prefer IPv4
The fix is to add -Djava.net.preferIPv4Stack=true to JAVA_OPTS. This tells the JVM to try binding an IPv4 address first, and resort to IPv6 only if that fails. Change multicast interfaceConfluence might have selected the incorrect interface for multicast traffic, which means it cannot connect to other nodes in the cluster. To override the interface used for multicast traffic after initial setup, edit confluence.cfg.xml in the Confluence home directory and add a property (or change the existing one) to select your desired network interface. For example to tell Confluence to use eth1: <property name="confluence.cluster.interface">eth1</property> Increase multicast TTLThe multicast time-to-live (TTL) specifies how many hops a multicast packet should be allowed to travel before it is discarded by a router. It should be set to the number of routers in between your clustered nodes: 0 if both are on the same machine, 1 if on two different machines linked by a switch or cable, 2 if on two different machines with one intermediate router, and so on. Create a file in the Confluence home directory called tangosol-coherence-override.xml. Add the following to it, setting the TTL value appropriately (1 is the default): <?xml version='1.0'?> <coherence> <cluster-config> <multicast-listener> <time-to-live system-property='tangosol.coherence.ttl'>1</time-to-live> </multicast-listener> </cluster-config> </coherence> Alternatively, simply start Confluence with the system property: -Dtangosol.coherence.ttl=1. Again, 1 is the default value, and you should change it to something appropriate to your network topology. Check intermediate routersAdvanced switches and routers have the ability to understand multicast traffic, and route it appropriately. Unfortunately sometimes this functionality doesn't work correctly with the multicast management information (IGMP) published by the operating system running Confluence. If multicast traffic is problematic, try disabling advanced multicast features on switches and routers in between the clustered nodes. These features can prevent multicast traffic being transmitted by certain operating systems. For best results, use the simplest network topology possible for the cluster traffic between the nodes. For two nodes, that means a single network cable. For larger numbers, try using a single high-quality switch. Advanced Tangosol configurationIf the solution to your problem involves changes to the Tangosol configuration, these changes should not be made to the Confluence configuration in confluence/WEB-INF/classes/. Instead, to ensure your configuration survives upgrades, make your changes via:
Examples of making these changes are shown in the increasing the TTL section. Didn't find a solution? Contact Atlassian supportWe have dedicated staff on hand to support your installation of Confluence. Please follow the instructions for raising a support request and mention that you're having trouble setting up your Confluence cluster. |

Comments (28)
Apr 17, 2007
Klaus Rothert says:
To modify the multicast TTL in version 2.4.4 neither the tangosolcoherenceoverri...To modify the multicast TTL in version 2.4.4 neither the tangosol-coherence-override.xml nor the Dtangosol.coherence.ttl=1 system property seem to work.
Modifying <property name="confluence.cluster.ttl">5</property> in /confluence.cfg.xml works.
Jul 02, 2007
Alessandro Rocca says:
Is there a full reference to confluence.cfg.xml? Or is the code the only 'source...Is there a full reference to confluence.cfg.xml? Or is the code the only 'source' of info?
Thx.
Jul 18, 2007
Gary S. Weaver says:
We actually didn't find the multicast address in {{confluence.cfg.xml}} but we d...We actually didn't find the multicast address in confluence.cfg.xml but we did find it in (tomcat)/logs/atlassian-confluence.log like the following:
where ip and port are the multicast address and port that the sys admin will need to open up for UDP if not already open.
Update: the reason we didn't find the multicast address in confluence.cfg.xml was that we thought we were using a clustering license when we weren't. Reinstalling with clustering license (or I assume upgrading to clustering license) worked for us.
Jul 02, 2007
Alessandro Rocca says:
Due to network configuration and administration , I was hoping to be able to set...Due to network configuration and administration , I was hoping to be able to set multicast ip and port and unicast port by modifying Confluence configuration file. That's why I asked for a file reference or some other kind of indication.
Oct 11, 2007
Gary S. Weaver says:
Actually I wasn't replying to your question, only posting a comment regarding th...Actually I wasn't replying to your question, only posting a comment regarding the following statement on this page, "The multicast address and port used by Confluence can be found on the Cluster Administration page, or in confluence.cfg.xml in the Confluence home directory." We found the multicast address it was trying to use was not in confluence.cfg.xml but instead was found in atlassian-confluence.log.
Although I don't have a reference to confluence.cfg.xml, I did find info on tangosol-coherence-override.xml that lets you specify the multicast ip and port and unicast port, and lots of other stuff:
The problem that we are currently having in setting up clustering is the following error:
I had assumed this was due to UDP multicasting being blocked by the firewall, which was the case, so I gave the address and port to our sysadmin to open up that address. They opened it up, and I used the following script with a tangosol 3.2 jar I got from the tangosol site (but I assume I could have just as easily used the one in WEB-INF/lib packaged with confluence) to test out UDP multicasting (I found that info on the tangosol site and modified the script to assume java is on the path and tangosol.jar is in same dir):
#!/bin/sh /srv/java/bin/java -server -showversion -cp "./tangosol.jar" com.tangosol.net.MulticastTest $*Then call it like this on both of the clustered servers so I could see that both servers were communicating with each other via the same multicast address and port (first I ran it on just one server, then ran it at the same time on the other server to see the difference). The address and port are from atlassian-confluence.log:
Will post more info when I get it figured out... let me know if you have any ideas.
Update: As mentioned above, we were using a non-clustering license and UDP multicasting had not been enabled via the firewall. The original XML posted in this comment got messed up because of a Confluence bug related to posting XML in comments.
Sep 13, 2007
Daniel Petzen says:
I had some initial problems with both connectivity and the randomly selected IP ...I had some initial problems with both connectivity and the randomly selected IP address. I tried to modify the tangosol-coherence-override.xml as suggested, but that did not make any difference. I had to modify confluence.cfg.xml for these changes to take effect:
Confluence complain about the IP not matching the cluster name on the Cluster Configuration page, but this doesn't seem to have a functional impact. The decision to generate a random multicast IP based on the cluster name seems very odd to me, especially in a controlled corporate network environment.
Jan 29, 2008
Christopher Owen says:
The decision to generate a random multicast IP based on the cluster name seems v...Confluence Clustered was never intended to be clustered over anything other than a private subnet between the nodes; there should be no network restrictions used or required over such a subnet.
If we were to start supporting more complicated network setups we would of course make the configuration more flexible. At the moment it takes a fairly technical and unnecessary piece of configuration detail out of the equation.
Feb 10, 2008
Gary S. Weaver says:
Thought I'd post this info I learned through setup. If you are getting errors w...Thought I'd post this info I learned through setup.
If you are getting errors w/cluster panic or the confluence server not showing the dashboard, first make sure that you installed a cluster license. If you didn't specify a cluster name during setup, it wasn't a cluster license (this bit us because we had a non-clustering and clustering license and used the wrong one).
Also, you can get a similar error if UDP/multicasting is not enabled. After you (re)enable UDP/multicasting (via firewall rules, etc.) the server(s) that were having trouble will continue to have trouble until you do this:
1) shutdown webapp servers (tomcat, for example)
2) on server that is still working, copy its tomcat and confluence.home dir to the other server(s) that are not working
3) start first (working) server back up all the way
4) start other (previously non-working) servers back up all the way
Basically you are re-setting up like you did in the cluster install directions.
Good luck!
Feb 10, 2008
Gary S. Weaver says:
Another issue we just ran into: For the first few server restarts of Tomcat wit...Another issue we just ran into:
For the first few server restarts of Tomcat with Confluence 2.5.4 where I brought the 1st server down then the second, then started the first completely, then the second completely- all was ok.
However, when I just restarted apache on the second server and then tomcat on the second server and then apache on the first server and then tomcat on the first server, both confluence servers have gone into cluster panic. I don't know how to recover from this since neither server is working.
Am currently thinking this is the result of us not having session affinity enabled in the load balancer which I see is required according to Clustering in Confluence and Confluence Cluster Installation.
server 01 says the following when I hit it (via load balanced URL so I just got lucky to hit it):
You cannot access Confluence at present. Look at the table below to identify the reasons
*Time* *Level* *Type* *Description* *Exception*
2007-07-18 13:23:34 (EventLevel: fatal) (EventType: cluster) Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.
server 02 says the following when I hit it (via load balanced URL so I just got lucky to hit it):
You cannot access Confluence at present. Look at the table below to identify the reasons
*Time* *Level* *Type* *Description* *Exception*
2007-07-18 13:23:00 (EventLevel: fatal) (EventType: cluster) Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.
2007-07-18 13:23:34 (EventLevel: fatal) (EventType: cluster) Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.
Jul 20, 2007
Alessandro Rocca says:
The info you posted helped me out to set correctly ip address and port number. R...The info you posted helped me out to set correctly ip address and port number.
Reguarding your last post, I had exactly the same problem. It happened everytime the DB server was shut down during the night to do a backup (it's a test environment, never happened in production). The application servers were not restarted and lost the connection with the DB server. When the DB server came up again, Confluence (2.5.2) started complaining.
We do use server affinity, so I can't say if that is really your issue. I think there might be something else out of place.
I don't know if this helps you (I don't think so), but still it could be usefull for someone!
BTW: Apache httpd 2.0, app server WLS8.1, Oracle 9.2
Sep 21, 2007
Gary S. Weaver says:
Ok, so far here is a list of things that seem to cause cluster panics in Conflue...Ok, so far here is a list of things that seem to cause cluster panics in Confluence 2.5.4-2.5.7: (Edited 9/21/2007)
(unsure)Session affinity not being enabled on the load balancer that is serving the clustered servers. This is not supposed to be a cause according to atlassian support, but it seemed to have happened to us. Have not had resources to replicate issue though.I think this is a non-issue.There is an improvement requested of confluence so that it can automatically recover from cluster panics. Please go to CONF-9297, vote on it, and watch it if you're interested.
Sep 04, 2007
Gary S. Weaver says:
So far we have done two database rollbacks with massive and it appears to work g...So far we have done two database rollbacks with massive and it appears to work great through that.
Today we did a database rollback while confluence was up and in cluster panic in both nodes, then stopped tomcat/confluence on both nodes, started the first node completely, then the second node completely and all looks good (is back up to its previous state at 10am yesterday). It's good to know that it works.
Update: In addition, in most cases a restart of one or both nodes will fix a cluster panic.
Oct 17, 2007
Gary S. Weaver says:
Please add "Only one instance can run on same node at a time" and "Partial/full ...Please add "Only one instance can run on same node at a time" and "Partial/full database rollback while Confluence is running" to problems that can cause Cluster Panic on a single node (with only one node running). Have heard from another university that they've also gotten Cluster Panic with only one node running when there was no database rollback and with only one instance running. Not sure why though. Just contact Atlassian support about it if you have any trouble.
Other notes:
Update: Atlassian support said this might require a change to the join-timeout-milliseconds setting of Coherence to 30000 or 120000. However this just made things worse for us. On the second restart of the primary node with join-timeout-milliseconds of 30000 we got 'INFO: Server startup in 381584 ms' and then instead of Cluster Panic we got a wierd looking System error with no stacktrace under it (in Confluence 2.5.7). Still working with Atlassian to figure this out...
grep -B 5 -A 5 ticks /var/log/dmesgDec 06, 2007
Gary S. Weaver says:
Atlassian determined the problem we had for months with clustering where the nod...Atlassian determined the problem we had for months with clustering where the nodes would fail after the following scenario (the reason we make anonymous comments to each node is because it is hard to replicate the issue otherwise in our load-balanced environment):
The issue is that the version of Tangosol Coherence (tangosol-3.2-b365-atlassian.jar) that Confluence currently uses does NOT work properly with Cisco switches in their default configuration. In this case, we were using the Cisco Catalyst 3550. Specifically James Fleming of Confluence said in CSP-13323, "Cisco switches, in particular, have known issues with multicast transmission. A comprehensive description is available on the Tangosol page on the subject ( http://wiki.tangosol.com/display/COH33UG/Deployment+Considerations+-+Cisco+Switches ), with links to further information on available solutions. Hopefully it's just a buffer issue, with its relatively simple fix."
The solution that we used was just to use a crossover cable between the two nodes. James also referred to the text in the page above:
From the cluster troubleshooting guide:
I'm really glad that it finally works!
Dec 06, 2007
Gary S. Weaver says:
Note the following items if you do have a second network adapter on each node an...Note the following items if you do have a second network adapter on each node and can use a crossover cable, or second non-Cisco switch, or second correctly configured Cisco switch for the secondary network adapter to connect 2 (or more) clustered nodes:
Dec 14, 2007
Matt Shepherd says:
Gary, Thanks so much for posting these two comments. I've been running in circ...Gary,
Thanks so much for posting these two comments.
I've been running in circles trying to solve these issues.
Jan 23, 2008
Anonymous says:
How do you find the right interface name to use in windows? I tried &...How do you find the right interface name to use in windows? I tried "Local Area Connection" but that didn't work..... I also didn't find that in the log files. In atlassian-confluence.log with 2.6.2 default settings a lot of DEBUG level messages are printed, and I saw DEBUG [main]
[confluence.cluster.tangosol.TangosolClusterManager] mergeConfig Merged configuration: <cluster-config>
which shows the merged configuration sent to tangosol. I tried setting confluence.cluster.ttl to 3, and that got sent, but I didn't see any entry for sending the interface name.
<multicast-listener>
<!--
Note: For production use, this value should be set to the lowest integer
value that works. On a single server cluster, it should work at "0"; on
a simple switched backbone, it should work at "1"; on an advanced backbone
with intelligent switching, it may require a value of "2" or more. Setting
the value too high can utilize unnecessary bandwidth on other LAN segments
and can even cause the OS or network devices to disable multicast traffic.
Note: For production use, the recommended value is 30000.
-->
<address>224.207.38.193</address>
<port>32365</port>
<time-to-live>3</time-to-live>
<packet-buffer>
<maximum-packets>64</maximum-packets>
</packet-buffer>
<priority>8</priority>
<join-timeout-milliseconds>30000</join-timeout-milliseconds>
<multicast-threshold-percent>25</multicast-threshold-percent>
</multicast-listener>
Jan 23, 2008
Anonymous says:
Ivan sent me a nice app that answered the question..... E:\tests>j...Ivan sent me a nice app that answered the question.....
eth0 was what I needed to know, and it worked fine in the config.
Dec 19, 2007
Fabien Bergeret says:
I'm currently working at integrating Confluence ni my client's infrastructure. H...I'm currently working at integrating Confluence ni my client's infrastructure. He uses clustered Websphere, but doesn't allow multicast.
Is it possible to use tangosol's unicast mecanism (<well-known-adresses> in the tangosol-coherence-override.xml file) ? Note that a static description of the cluster's members is fine for my customer.
Dec 27, 2007
Paul Curren says:
Yes, that should be possible Fabien. Other customers have had success using the...Yes, that should be possible Fabien.
Other customers have had success using the well-known-addresses configuration.
Remember that the Confluence clustering mechanism is completely independent of Websphere's.
Jun 11
Bharathi Dubey says:
Can one of you post a sample tangosolcoherenceoverride.xml with the <w...Can one of you post a sample tangosol-coherence-override.xml with the <well-known-addresses> in it? We are experiencing issues with multicast traffic used by Tangosol.