Inconsistency in group membership and user status on one or multiple nodes in Jira Datacenter.
Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.
This KB applies if you are running in DC and the Jira version is lower than 8.10. From 8.10 removing the stale nodes is being taken care of as per suggestion JRASERVER-42916.
In Jira Data Center, few users will randomly lose group membership or made as Inactive in User Management, which in turn will cause the login failure for that User on the impacted node/nodes. Symptoms include:
- User group membership on one or many nodes UI differ from the data in the DB.
- User status (active or inactive) in the UI differs from the DB.
Check the user and group membership which is in question in DB to confirm we have data inconsistency issues.
Check the user details of an affected user from the user table. Form this query make a note of the “active” column, 1 stand for active, and 0 stands for inactive.
select * from cwd_user where lower_user_name = '<lower_user_name>';
Check the group details of an affected user group from the membership table. Make a note of the “parent_name” column to see the groups associated with the user.
select * from cwd_membership where lower_child_name = '<lower_user_name>';
If anything appears different in the results from the database table compared to what you see in JIRA's user interface, then you may be affected by this issue.
Check the user details in user management on all the nodes. If the user mismatch is appearing on all the nodes then apply the fix as explained in KB article - LDAP users and groups display unexpectedly in Jira server
If the user issue is noticed only on a particular node/nodes. Then, we need to identify what causing this behaviour. The steps below will help us identify if changes to the cache are being sent from another environment.
Add below logging to more details on all the nodes, and a restart on all nodes is needed to make this effect.
Here we need to enable the below logging and need to wait for the issue to reappear.
File : <JIRA_INSTALL>/atlassian-jira/WEB-INF/classes/log4j.properties.
log4j.logger.net.sf.ehcache.distribution.RMICachePeer = DEBUG, filelog log4j.additivity.net.sf.ehcache.distributionRMICachePeer = false log4j.logger.com.atlassian.cache.event.com.atlassian.jira.issue = DEBUG, filelog log4j.additivity.com.atlassian.cache.event.com.atlassian.jira = false log4j.logger.com.atlassian.cache.event.com.atlassian.jira.config = DEBUG, filelog log4j.additivity.com.atlassian.cache.event.com.atlassian.jira.config = false
On issue reappearance, check for the affected group/user in the atlassian-jira.log* , there we see RMI events for the cache.
In the below log example we can see that “xx.xx.xx.xx“ IP/node sends a request for remove group “xx_xx“, this is the actual root cause of the problem.
2020-05-07 12:18:24,960 RMI TCP Connection(12898)-xx.xx.xx.xx DEBUG [n.s.ehcache.distribution.RMICachePeer] RMICachePeer for cache com.atlassian.jira.crowd.embedded.ofbiz.OfBizInternalMembershipDao.childrenCache: remote remove received for key: MembershipKey[directoryId=10100,name=xx_xx,type=GROUP_USER]
Next, identify the IP to see if the request is coming from outside DC cluster nodes. You may use nslookup/hostname command to resolve the IP.
The impacted node receives the group or user cache update from a node that is not part of the cluster. This happens when a backup is being restored to the lower environment where the impacted node is part of the source.
The resolution should be done on the environment which sends the cache update to the affected node.
We need to remove the node entry from the cluster table on the environment which we identified as part of the diagnosis. Please refer to the articles Remove abandoned or offline nodes in JIRA Data Center for removing the node from the cluster.