Troubleshooting JIRA Services

Still need help?

The Atlassian Community is here for you.

Ask the community

This KB only applies to JIRA 6.2.7 and below.

Symptoms

JIRA services, such as mail and the backup service are no longer executing or taking a very long time to execute.

Overview

JIRA services uses the QuartzScheduler, which is similar to cron or the Windows Task Scheduler, to run things inside JIRA. This is displayed in Scheduler Details and controlled in atlassian-jira/WEB-INF/classes/scheduler-config.xml, for example:

<scheduler>

    <jobs>
        <job name="ServicesJob" class="com.atlassian.jira.service.ServiceRunner" />
        <job name="RefreshActiveUserCount" class="com.atlassian.jira.user.job.RefreshActiveUserCountJob" />
    </jobs>

    <triggers>
        <!-- trigger type may be 'simple' (default) or 'cron' -->
        <trigger name="ServicesTrigger" job="ServicesJob">
            <startDelay>1m</startDelay> <!-- start delay is a DateUtils duration! -->
            <period>1m</period> <!-- amount of time between repeats -->
        </trigger>
        <trigger name="RefreshActiveUserCountTrigger" job="RefreshActiveUserCount" type="cron">
            <expression>0 0 0/2 * * ?</expression><!-- run every 2 hours -->
        </trigger>
    </triggers>

</scheduler>

The job is set and that triggers based on a period or cron expression and when the trigger fires it executes the class for the job. So we can see the services job gets fired every minute, which then executes what you see in JIRA Services based on the delay and a series of other arguments as stipulated in the service. There's a set number of threads (this is version-specific and set in atlassian-jira/WEB-INF/classes/quartz.properties) that the quartz scheduler uses, for example QuartzWorker-0 or QuartzScheduler_Worker-0 (the name depends on the version of JIRA) which are responsible for executing the ServicesJob and the RefreshActiveUserCount.

Further information on the Quartz Scheduler can be found in their documentation.

Cause

If all the QuartzWorker threads are held up by something (maybe a method with an endless call, some sort of bug or a slow running process) it can stop or slow down the services job executing which subsequently can stop or slow down the mail queue flushing. Also, if something like an OutOfMemoryError crashes the QuartzScheduler (for example (Archived) Mail queue stops processing after OutOfMemoryError in Jira server) , the services stop executing completely and no warning is provided to the user (we have a bug for this tracked in  JRA-24856 - Getting issue details... STATUS ).

Diagnosis

Additional logging and thread dumps are key to understanding the state of those threads and also what is going on with JIRA that could be slowing down the execution of the services. In order to do this, go through the following:

  1. Stop JIRA application.
  2. Enable GC logging as per our Troubleshoot Jira Server performance with GC logs KB.
  3. Start JIRA application.
  4. Enable additional logging in Administration > System > Troubleshooting and Support > Logging and Profiling by setting the following to DEBUG:
    • com.atlassian.jira.service

    • com.atlassian.jira.service.services.DebugService
  5. Let JIRA application run for sometime (say 5-10 minutes) to allow the QuartzWorker to do its magic. We need to get the instance into a state where it is not doing what it's supposed to do for the below to give us usable information.
  6. Create 3-6 thread dumps as per Generating a Thread Dump every 10 seconds or so.
  7. Review the thread dumps with a tool similar to TDA checking to see the state of the QuartzWorker-0 and QuartzWorker-1 threads (for example they may be locked). If these threads are missing, it's likely the instance is running into the bug detailed in  JRA-24856 - Getting issue details... STATUS .
  8. Analyse the GC logs as per the Troubleshoot Jira Server performance with GC logs to see if memory is an issue.
  9. Review the additional logging that was set in step 4 for any errors or exceptions that may indicate what is going on with the instance.

Resolution

The resolution will depend upon the state of those threads and what is in the logs. If the threads do not exist, upgrading to the fixed version in JRA-24856 - Getting issue details... STATUS or applying the workaround can fix this problem.

Last modified on Apr 6, 2016

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.