Event system
Atlassian apps have an internal event system that allows core functionality and apps to respond to events such as user actions. When a user performs an action, an event is raised. The event system then dispatches the event to all registered event listeners.
There are two types of event dispatches:
- Synchronous: the event is dispatched to event listeners immediately and the request is blocked until all event listeners have processed the event.
- Asynchronous: the event is put on the event queue and is dispatched to event listeners in the background. The request that raised the event does not wait for the event listeners to complete their processing of the event.
Many core features make use of the event system as event producers, event listeners or both. In addition, many third-party apps rely on the event system to integrate with the app, or to respond to important app events.
Disruption or degradation of the event system can impact system performance, stability and availability.
Diagnostics monitors the event system for these acute problems, but also for symptoms that could cause problems under increased load.
The following issues are monitored by the diagnostics tool:
- EVENT-1001: An event was dropped because the event queue was full.
- EVENT-2001: A slow event listener was detected.
Refer to the full event descriptions below.
EVENT-1001 An event was dropped because the event queue was full ERROR
Problem
An asynchronous event could not be dispatched to registered event listeners because the event queue is full - too many events are awaiting dispatch. As a result, the event has not been processed and (core) functionality may be degraded. What functionality is degraded depends on the event type.
this only affects asynchronous dispatch of events.
Alert Details
When EVENT-1001 is raised, the following data is collected to help diagnose the problem:
eventType
: the type of event that was droppedqueueLength
: the capacity of the event queuethreadDumps
: a list of thread dumps of all event processing threads that were busy when the event was dropped.
Cause
The event queue can fill up if events are being raised faster than they can be processed. This is most commonly caused by one or more event listeners taking too long to process an event, or getting completely stuck.
If this is the case, one or more EVENT-2001 (slow event listener detected) alerts have likely been raised. These EVENT-2001 alerts can help identify the offending event listener, including the app that provided it.
If EVENT-2001 alerts do not clearly identify the cause, the thread dumps in the alert details will identify what the event processing threads were doing at the time of the alert.
The event queue may fill up if a component produces too many events in a short time.
Mitigation
If a slow or blocked event listener has been detected and the event listener is provided by a non-critical app, that app could be (temporarily) disabled. For marketplace apps, report the problem with the vendor.
For Atlassian-provided apps, contact support at https://support.atlassian.com. For apps that were developed in-house, contact the relevant developer.
Configuration options
Thread dump cool-down period: Controls how often thread dumps should be generated for alerts relating to dropped events. Taking thread dumps can be computationally expensive and can produce a large amount of data when run frequently.
EVENT-2001 A slow event listener was detected WARNING
Problem
An event was dispatched to an event listener, but the event listener took a long time processing an event. For synchronous events, this means that the user request that raised the event had to wait a long time for the request to complete. For asynchronous events, this means that one of the event processing threads was unavailable for dispatching other events during this time.
EVENT-2001 does not imply that there is an acute problem. Rather, it's a warning that performance may be degraded (for synchronous events).
If EVENT-2001 happens frequently or if events are being raised at a high frequency (heavy load), asynchronous event processing will fall behind and the event queue can fill up. This can ultimately lead to EVENT-1001.
Alert Details
When EVENT-2001 is raised, the following data is collected to help diagnose the problem:
trigger
: the specific event listener that was slow, including the app that registered ittimeMillis
: the time in milliseconds that the event listener spent processing the eventeventType
: the type of event that was processed (slowly).
Cause
Event listeners can be slow for a variety of reasons. Some of the most common causes are:
- The event listener performed a long running operation on the event thread: e.g. some type of indexing or automated analysis.
- The event listener performed blocking I/O: e.g. sent a HTTP request to an external system without configuring a sufficiently short timeout.
- The event listener was blocked trying to acquire a lock: e.g. a cluster lock, a local app lock, a database lock, etc.
- A system-wide issue affected the event listener, causing it to be slow:
- The system had a long gc pause.
- The system has run out of database connections.
Mitigation
Some of the causes listed in the previous section indicate that the event listener itself is the cause (long running operation, blocking I/O, acquiring locks without a short timeout). If this is the case, the problem should be reported to the app vendor.
For marketplace vendors, the marketplace listing contains a 'Support' section with instructions on how to best report issues.
For apps provided by Atlassian, please raise an issue or contact Atlassian support if the problems are acute.
For apps that have been developed in-house, please contact the app developer (See Guidelines for Data Center App Development for guidelines on how to avoid slow event listeners).
If the triggering event listener is the cause of the issue, and the event listener is provided by a non-critical app, consider (temporarily) disabling the app.
Configuration options
Slow event listener limit: Controls when an alert is raised for a slow event listener. If an event listener is slower than the configured limit, an EVENT-2001 alert is raised.
Slow event listener limit overrides: Configures overrides for specific event listeners and/or specific apps. This setting can be used to suppress 'slow event listener detected' alerts for specific event listeners or plugins, which have been determined to not be problematic. The value should be a comma-separated list of configurations of individual triggers, where a trigger is either the plugin-key, or the plugin-key followed by the event listener class name. Overrides are only considered if they specify more lenient limits than the value specified by diagnostics.issues.event.slow.listener.time.millis
.