Confluence notification mail slow to arrive or having huge delay
Symptoms
The notification mail for all events are slow to reach the users, sometimes can reach up to hours to reach the intended recipient. No warning or errors are spotted at the mail queue, emails are leaving gradually but slowly.
Diagnosis
There could be multiple causes for this to happen, you will need to diagnose following the steps below to see if which could be the cause to the observation:
Mail Queue Service Delay
Mail Queue Services are scheduled to be running by default at once per minute. But sometimes due to user preference, it can be altered to different delay and this will take direct effect on how long it will be delayed to reach the end user. This can be verified by navigating to General Configuration > Scheduled Jobs and check if the mail queue is set to a different value, revert back to 1 minute and see if this helps. Proceed on with the Diagnosis if this value is set as default.
Mail Queue Error Message Stuck
There could be times where there is an error message not able to be flushed and therefore the Confluence application attempts to flush the mail, again and again, causing other mail items to be delayed on sending. But this can be identified from the logs itself or visually monitor the mail queue. Navigate to mail queue and perform a manual flush and see if there is any mail not leaving the queue if you manage to identify which mail is causing the delay, perhaps you could reach out to https://support.atlassian.com for further assistance.
Services and Third Party Plugins Services
By default, Confluence applications services are running on either 2 or 4 QuartzWorker threads - this depends on the version of Confluence. If there is some obstruction or delay on the threads executing, the next service may not be executed despite having the services scheduled to run. To diagnose, follow steps below:
- Navigate to Administration > Troubleshooting and Support > Logging and Profiling.
Look for the Default Loggers and click on the Configured logging level for another package:
If Confluence is restarted, these extra logging levels will be disabled and will need to be re-enabled upon each startup.
com.atlassian.confluence.service.services.mail
com.atlassian.mail.queue
(this will not appear when you check on the step below, but it will still be triggered).
- Verify that the added parameter is appearing at the list below (hint: try to search using the browser search function, it should appear as DEBUG after you enabled.)
Keep the logs running for 48 hours or a period of time depending on your preference, as long as it captures the peak hour activities.
From the logs generated ($Confluence_HOME/log Directory) filter for the string "Attempting to run mail queue service" (hint: Linux based machine can use grep to print the result into a file, for Windows, utilize the findstr to print the result into a file). You will notice something like below as example:
2013-06-24 07:26:40,565 QuartzWorker-1 DEBUG ServiceRunner Mail Queue Service [service.services.mail.MailQueueService] Attempting to run mail queue service 2013-06-24 07:28:40,570 QuartzWorker-1 DEBUG ServiceRunner Mail Queue Service [service.services.mail.MailQueueService] Attempting to run mail queue service 2013-06-24 07:30:41,587 QuartzWorker-0 DEBUG ServiceRunner Mail Queue Service [service.services.mail.MailQueueService] Attempting to run mail queue service 2013-06-24 07:32:40,570 QuartzWorker-0 DEBUG ServiceRunner Mail Queue Service [service.services.mail.MailQueueService] Attempting to run mail queue service 2013-06-24 07:59:45,790 QuartzWorker-0 DEBUG ServiceRunner Mail Queue Service [service.services.mail.MailQueueService] Attempting to run mail queue service
- Compare the timing of the execution, from the example given above, you will notice that the service is executed 2 minutes once and sometimes it overshoots to almost 30 minutes. If your result is consistently 1 minute, it shows that most likely it could be caused by other factors, for example, the reason below:
Network and Mail Server
- Verify that you are able to reach the mail server by ping and check if the latency is high on delivery.
- If the latency is high this will cause the delay for the JavaMail used by Confluence applications to receive the response in a timely manner.
- It could also be the load that the mail server is overloading the SMTP and therefore unable to complete the requests on time.
- Consult the network administrators and mail server administrators if you encountered such problem.
Reverse DNS
Every time mail is attempted to be sent, it will perform a reverse DNS lookup for the Confluence server application hostname. If the DNS isn't reachable, Confluence application will have to wait for a timeout which can be a long period of time (20-40 seconds).
Resolution
Depending what is the reason behind the delay of the mail service, follow the resolution below:
Mail Queue Service Delay
Deduct the mail queue delay on the service to a shorter delay that fits your need.
Mail Queue Error Message Stuck
- Check if the message that is stuck contains any information that might prompt towards the root cause (Constant stuck on one particular recipient, or notification for one particular issue, or one particular type of event)
- Provide your observation and data collected to https://support.atlassian.com for further analysis on the root cause (Please help to generate a full support zip (Administration > System > Atlassian Support Tool > Support Zip) for better diagnosis at our end)
Services and Third Party Plugins Services
- Check the services running, if you have more than 5 services running at the same delay as the mail queue (default 1 minute), it could delay the service to be executed as it will need to wait for another service to be completed on time. Set proper delay for services so it can be scheduled to run at a different time, example: mail handler service to run 5 minutes once and so forth.
- Adjust the delay higher for other services at the mean time to determine which service is causing the problem (larger the delay will enable clearer view in the logs), see if the mail queue is getting executed on time. Readjust after you have determined which is the root cause of the delay.
- Feedback to Third Party Plugin vendor if you found a particular service on that is provided by the Third Party takes a longer time to execute and complete.
Network and Mail Server
- Consult the network administrator or mail server administrator to further troubleshoot on this.
Reverse DNS
- Verify the hosts file is correct.
- Verify the hostname of the server is correct.
Ping the hostname:
ping <HOSTNAME>
Perform an
nslookup
on the IP that is returned:nslookup <IP_ADDRESS_FROM_PING>
(Linux only): Perform a DIG lookup:
dig -x IP <IP_ADDRESS_FROM_PING>