Soft lockup messages from Linux kernel on Hipchat Server
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Admin sees the following in
The soft lockups can cause the Hipchat Server to freeze or stop responding, which subsequently causes other issues with normal operation.
The hypervisor is not keeping up with CPU demand of the Hipchat Server Virtual Machine (VM). Typically, system resources at the hypervisor level is not sufficient which then affected the Hipchat Server appliance and its performance or there's not enough underlying compute resource to keep the Hipchat VM going.
Here's a related VMware article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009996
When running a Linux kernel in a symmetric multiprocessing (SMP) enabled virtual machine, messages similar to BUG: soft lockup detected on CPU#1! are written to the message log file. The exact format of these messages vary from kernel to kernel, and might be accompanied by a kernel stack backtrace.
When running in a virtual machine, this might instead indicate high levels of overcommitment (especially memory overcommitment) or other virtualization overheads.
The soft lockup messages indicate that the vCPUs are waiting some amount of time before the hypervisor is able to provide the resources necessary for a particular process inside the VM to continue functioning.
Our recommendation would be to evaluate the Virtual Machine to determine if it needs more resources, if VMs need to be moved off the system, switch to vSphere ESXi, etc.
If the Virtual Machine has been allocated the proper amount of resources for the number of users per our System Requirements, then another possible workaround involves increasing the watchdog_thresh value on the server by running the following commands:
echo 60 > /proc/sys/kernel/watchdog_thresh
Please monitor the system after making the setting above to ensure it stabilizes and the number of softlockup messages is reduced.