Hipchat Server CPU Spike caused by _ohai_btf.py processes after 1.4.1 upgrade
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the server and data center platforms.
This is for an outdated version of Hipchat Server
This article applies to a version of Hipchat Server which will be deprecated soon. After that period the version will no longer be supported.
You should upgrade to a more recent version of Hipchat Server as soon as you can to take advantage of new features, and security and bug fixes.
Problem
The CPU of the machine hosting Hipchat Server is constantly high after the upgrade to the version 1.4.1 causing the following problems:
- Hipchat Server is unresponsive / slow
- Unable to login to the web interface via https://<fqdn>
- Unable to SSH to the Hipchat Server
This CPU utilisation be observed from the runtime.log specifically looking at the output of + /opt/atlassian/hipchat/sbin/_stats.py --show.
runtime.log.1:CPU: 98.0% of 2 cores
runtime.log.1-Clock: 2016-06-14 08:25:23 UTC
..
runtime.log.1:CPU: 99.1% of 2 cores
runtime.log.1-Clock: 2016-06-14 09:40:36 UTC
..
runtime.log.1:CPU: 100.0% of 2 cores
runtime.log.1-Clock: 2016-06-14 11:13:30 UTC
Diagnosis
Environment
- Hipchat Server 1.4.1
Diagnostic Steps
The following appears in the runtime.log under the + ps auxwww section. As we can see, the _ohai_btf.py process swarms the list of processes of the Hipchat Server
root 21022 0.0 0.0 11112 192 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21025 0.0 0.0 11112 188 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21026 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21028 0.0 0.0 11112 192 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21030 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21031 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21032 0.0 0.0 11112 188 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21033 0.0 0.0 11112 192 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21035 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21036 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21037 0.0 0.0 11112 188 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21038 0.0 0.0 11112 184 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21040 0.0 0.0 11112 180 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21041 0.0 0.0 11112 188 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
root 21042 0.0 0.0 11112 192 ? S May20 0:00 /bin/bash /opt/atlassian/hipchat/sbin/_ohai_btf.py
Since the _ohai_btf.py process is related to Hipchat's Phone-Home, we tried to switch it off using Disabling Phone-Home Signal but that did not help.
Terminating the process through kill -9 was also unsuccessful as the process will continue to fill the runtime.log entries.
sudo dont-blame-hipchat
ps aux | grep ohai
kill -9 <ohai_pid>
Cause
While the specific cause of the issue is unknown, there is a possibility of corruption that occurred in the cron config or ohai script as there are thousands of lines of _ohai_btf.py process being run.
Workaround
Please execute the commands in your Hipchat Server terminal / SSH console to completely
Obtain the root access to your Hipchat Server:
sudo dont-blame-hipchat
Navigate to the startup_scripts directory:
cd ~/startup_scripts/
Download the shell script remove-ohai-fix.sh to the directory:
wget https://s3.amazonaws.com/hipchat-server-stable/utils/remove-ohai-fix.sh
Grant the user file execution permissions for that shell script:
chmod +x remove-ohai-fix.sh
Execute the shell script by running the command:
./remove-ohai-fix.sh