Soft lockup messages from Linux kernel on Hipchat Server

Still need help?

The Atlassian Community is here for you.

Ask the community

 

 

This article only applies to Atlassian's server products. Learn more about the differences between cloud and server.

 

 

Problem

Admin sees the following in /var/log/hipchat/kern.log

The soft lockups can cause the Hipchat Server to freeze or stop responding,  which subsequently causes other issues with normal operation.

Cause

The hypervisor is not keeping up with CPU demand of the Hipchat Server Virtual Machine (VM). Typically, system resources at the hypervisor level is not sufficient which then affected the Hipchat Server appliance and its performance or there's not enough underlying compute resource to keep the Hipchat VM going. 

Here's a related VMware article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009996

When running a Linux kernel in a symmetric multiprocessing (SMP) enabled virtual machine, messages similar to BUG: soft lockup detected on CPU#1! are written to the message log file. The exact format of these messages vary from kernel to kernel, and might be accompanied by a kernel stack backtrace.
...
When running in a virtual machine, this might instead indicate high levels of overcommitment (especially memory overcommitment) or other virtualization overheads.

The soft lockup messages indicate that the vCPUs are waiting some amount of time before the hypervisor is able to provide the resources necessary for a particular process inside the VM to continue functioning.

Resolution

Our recommendation would be to evaluate the Virtual Machine to determine if it needs more resources, if VMs need to be moved off the system, switch to vSphere ESXi, etc. 

If the Virtual Machine has been allocated the proper amount of resources for the number of users per our System Requirements, then another possible workaround involves increasing the watchdog_thresh value on the server by running the following commands:

sudo dont-blame-hipchat
echo 60 > /proc/sys/kernel/watchdog_thresh

Please monitor the system after making the setting above to ensure it stabilizes and the number of softlockup messages is reduced.

 

Last modified on Nov 2, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.