This server will be upgraded at 3pm Sydney time on December 3rd (December 2nd, 8pm PST) and will be down for up to 30 minutes.

Statistical Analysis Plugin

Name Statistical Analysis Macro
Vendor Adaptavist.com Ltd (Website)
Authors Dan Hardiker
Homepage http://confluence.atlassian.com/display/CODEGEIST/Statistical+Analysis+Plugin
Issue Management http://tracker.adaptavist.com/browse/STATS
Continuous Integration n/a
Categories Administration Advanced Macros Content Macros
Most Recent Version 1.0
Availability Confluence v2.7 to v2.10
State Stable
Support Unsupported Plugins
License Freeware / Open Source (BSD)
Price Donate
Release Docs http://confluence.atlassian.com/display/CODEGEIST/Statistical+Analysis+Plugin
Java API Docs n/a
Download Source http://svn.atlassian.com/svn/public/contrib/confluence/stats-plugin/tags/2.0
Download JAR stats-plugin-2.0.jar

Description/Features

Confluence has lacked a cluster-ready, enterprise scaleable, remotely accessible statistically gathering and analysis plugin ... not any more!

Ready for the Enterprise
The primary objective for this plugin was to build something that can handle 1000 sessions an hour (a moderate load for any seriously sized organisation). That sounds quite a reasonable number, and you would have thought that would be easy to deal with until you crunch a few numbers.

  • 1000 sessions comes to a median average of around 30 events per session
  • 30,000 events an hour
  • 720,000 events a day
  • 5,040,000 events a week
  • 262,040,000 events a year!

Considering that we also have to be cluster-safe, I opted to go for the database to store in – and that's a lot of data to store!

Intelligent Caching
After wiping the sweat from my brow after crunching those numbers (there's something nicely recursive about getting stats on the stats plugin) it was clear: we're going to need some funky caching to make this useable.

The concept of Data Widgets will be explained later, but what matters here is that they expose a hash of their configuration, so that each interval they expose can be cached either in final or partial forms. Partial forms can be cleaned up / updated and final datasets will sit there forever.

  • On a 400,000 event dataset on my MacBook Pro laptop, an uncached widget processing the entire data set into 120 unique queries and took: 36,450ms
  • Repeating the same widget but allowing it to use the cache, including updating any partial intervals reduces the queries to just 3 and took: 783ms
  • It would be even faster if I didn't let it update partial intervals!

Store now, Report later
This plugin listens to events that are fired in Confluence and stores them in an automatic configuring and self updating database (defaulting to the Confluence database). The information gathered is done quickly and grabs as much data as it can without impacting performance – it would rather capture and store too much, than too little, so the reports of tomorrow can utilise the data mined today. It defaults to using Confluence's internal database, but larger installations can choose their own database schema.

Cross Database
The plugin works with many databases and has an architecture to allow individual database types to be tuned and optimised. Upgrades to the database schema are dealt with by a funky library called LiquiBase. The plugin is architectured so that if you decide to use LiquiBase in your own plugins, it will place nice.

Event Queue
Events happen synchronously, so they are queued and processed in the background every 5 minutes (or when the queue grows too rapidly). There is a full event queue manager which allows administrators to kick items off the queue if there are issues, or just allow you to be nosy!

Exception Queue
Should there be exceptions during processing (such as a SQL exception, or a runtime exception) then there is a similar exception manager, which captures the exceptions thrown allowing you determine the association between the exception and the data being processed at the time. This should significantly reduce error diagnosis.

Externally Accessible
Where-ever possible, data will be provided both over a HTML UI, and over a REST API. The aim of this plugin is to ensure that the information is easily exposes allowing third party applications to harness the power of the stats.

For example: exposing popular statistical information to JIRA Studio's dashboard.

Concept: Data Widgets
A data widget is the plugin's terminology for the business logic that goes into processing of the data. It's wrapped by generic functionality, such as the ability to filter each widget by Start & End date, optionally repeat a given interval between those dates, filter further by space (inclusive or exclusive) and finally selecting a number caching options.

For example: you might want Widget A to give you the data from the month of March, broken down into daily/weekly intervals.

There is a default DataWidgetRunner accessible through the UI which gives raw access to run the registered widgets, with the default options selectable and the result outputted as a dynamic table. Pass in output=xml as well and it'll output it as XML instead of HTML, voila - an inst-o-matic RESTful API!

I've also exposed the the runner through a {statsDataWidget} macro, which takes all the customisation options you expect with sensible defaults.

Reports & Report Widgets (not yet implemented)
Report Widgets will take Data Widget and render out the information, possibly processing it further before turning into something else - e.g. a chart of some form.

Reports are simply a collection of Report Widgets backed by a series of Data Widgets, with Report configuration setting global options filtering them down into the data widgets.

Other

  • It is fully internationalised, allowing porting to any language.
  • There are database tools for managing the raw database.
  • Detail debugging to enable you to target certain packages / functionality to debug and diagnose issues. This should also allow better problem diagnosis result in a lower overhead, thus faster bug fixing.
  • Readily accessible information (such as JVM Memory usage, the logged in user, the HTTP session, remote IP address, user agent, referrer etc) is all captured with each event, allowing non-event specific information to be reported on and used later.
  • Fully documented macro in the Notation Guide under Advanced macros.
This plugin is designed to be a solid foundation, and will likely need modifications to fit on a platform we've not tested - such is the joy of so-called database agnostic SQL statements and structures.

Tested Environments

I have added a Generic database profile which uses standard unoptimised SQL statements. If this profile is used, a warning will be logged with the information needed to identify your database type.

Please note: data widgets can do complex things and sometimes implement their own SQL, thus the support level may vary from widget to widget. Problems? Report them.

It is not recommended that you use HSQL for production systems!

Confluence v2.7.1, v2.8.0
Databases HSQL (lightly tested), MySQL (heavily tested)

Macro Parameters

{statsDataWidget}

Param Value(s) Default Description
widget Widget Class / FQCN ReadWriteRatioWidget The widget class to use (will have the standard package prepended without a package), pick from:
ConcurrentSessionsWidget, MostEditedContentWidget, MostViewedContentWidget, ReadWriteRatioWidget, MemoryUsageWidget
startDate Date matching: "M/d/yyyy" first date on record The first date to include
endDate Date matching: "M/d/yyyy" last date on record The last date to include
intervalType int Calendar.MONTH (2) Integer matching Calendar.TYPE
intervalCount int 0 intervals to count up each iteration (0 means just a single iteration from startDate to endDate)
intervalTitles title 1,title 2,... none Comma separated titles - when it runs out of titles, it'll revert to the interval start date.
spaces spaceKey,spaceKey,... none Comma separated space keys (empty means all included / none excluded)
excludeSpaces boolean false Exclude instead of including the space keys (ignored if above is empty)
cacheRead boolean true Read from the cache
cacheWrite boolean true Write to the cache
cacheUpdatePartial boolean true Update partial intervals in the cache
showHeader boolean false Show widget information
theme basic / horizontal basic The theme you wish to use for the output template.
intervalDateFormatter see SimpleDateFormat d-MMM-yyyy The date formatter you want used for the intervals.
hideIntervals boolean false Hide the interval dates.

Widgets
See separate section which gives more details on the data widget.

Common Interval Types

When setting the intervalType field, here are a list of common interval types.

Calendar Field Integer Value
YEAR 1
MONTH 2
WEEK_OF_YEAR 3
DAY_OF_YEAR 6
HOUR_OF_DAY 11

Common SimpleDateFormat Patters

When setting the intervalDateFormatter field, here are a list of patterns. Examples show 9th April 2009 at 10:28pm.

Pattern Result Description
d-MMM-yyyy 9-Apr-2009 Simple date
MM/d/yy HH:mm 04/9/2009 22:28 Numeric date with 24hr time
MMMM ''yy April '08 Month only with abbreviated year

Data Widgets

Data Widgets are the number crunchers of the plugin - they are what take the raw data and interpret it into something useful. The AbstractDataWidget API has been made deliberately extensible so that more widgets can be added over time; hopefully we'll end up with a nice collection enabling rich reports to be built on top of them.

When executed, a widget processes the startDate and endDate, dividing the work up into chunks called intervals. The size and quantity of intervals can be specified in the general widget parameters.

Once the list of intervals has been created, the cache is optionally consulted and the remaining intervals are passed through to the widgets bespoke logic for execution. This bespoke logic is described below.

During the bespoke logic a widget implementation is expected to process the data for the given interval with the constraints provided, and generate result data. This data can be simple (like a number or simple text), in some cases it's more complex where it might return a collection of more complex objects.

Read:Write Ratio
This widget takes all the Page/BlogPost/Attachment events for creation, editing and viewing and combines them into the three columns. This should give you a good impression of your read:write ratio, as well as the overall usage of areas of your site.

Class ReadWriteRatioWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none yet
Fields Outputted Creates (int), Edits (int), Views (int)

Concurrent Sessions
This widget finds the number of unique Session IDs (regardless of event type) and totals them up. This will tell you how many unique visitors (not hits) you've had during that period.

Class ConcurrentSessionsWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted Sessions (int)

Most Viewed & Edited Pages
This widget looks at all the viewed/edited pages and compiles a list of the top 10 CEOs for that time period. This widget is typically run without intervals (i.e. intervalCount=0). It doesn't currently render too well with the current themes, but the information is there.

Class MostViewedPagesWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted List<CEOResult> (a list of the top 10 Pages)
Class MostEditedPagesWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none
Fields Outputted List<CEOResult> (a list of the top 10 Pages)

Memory Usage Ratio
This widget was written in the last 30 minutes of Codegeist 2008 and demonstrates how you can quickly add a new processing widget (the atomic commit revision should be useful for coders wanting to explore). The information outputted is the average free, max and total memory across the interval.

Class MemoryUsageWidget
Written By Dan Hardiker (Adaptavist)
Custom Parameters none yet
Fields Outputted Free (int), Max (int), Total (int)

Example Usages

Want to put the read:write ratio output covering all the content in your system on a page? Easy!

{statsDataWidget}

Would you prefer to count the number of sessions you've ever had?

{statsDataWidget:widget=ConcurrentSessionsWidget}

Prefer to break down the sessions per day?

{statsDataWidget:widget=ConcurrentSessionsWidget|intervalType=6|intervalCount=1}

Prefer to break down the read:write ratio per month, formatting the interval time (if there is more than one) to "April '08" style, shown horizontally??

{statsDataWidget:widget=ReadWriteRatioWidget|intervalType=2|intervalCount=1|intervalDateFormatter=MMMM ''yy|theme=horizontal}

Want to put the above into a line chart?

{chart:type=line}
{statsDataWidget:widget=ReadWriteRatioWidget|intervalType=2|intervalCount=1|intervalDateFormatter=MMMM ''yy|theme=horizontal}
{chart}

or maybe tweaked to fit my test data set and be a bit prettier:

{chart:type=line|width=650}
{statsDataWidget:intervalType=6|intervalCount=1|theme=horizontal|intervalDateFormatter=dd/MM}
{chart}

which produces:

or you can now go one better and specify the interval titles yourself, and include a nice chart:

{statsDataWidget:hideIntervals=true}

{chart:type=bar}
{statsDataWidget:theme=horizontal|intervalTitles=Read:Write Ratios}
{chart}

which produces:

You can even get funky memory graphs - which will eventually evolve into a proper Confluence health monitoring toolkit! Here is an example which breaks down the statistical data into daily chunks and produce a nice graph:

{chart:type=line|width=650}
{statsDataWidget:widget=MemoryUsageWidget|intervalType=6|intervalCount=1|theme=horizontal|intervalDateFormatter=dd/MM}
{chart}

which produces:

You may want to consult the [documentation for the chart plugin] too.

Enjoy!

Future Plans

This plugin is growing on a weekly basis, here is what we're looking to implement (in no particular order):

  • Background report creation (pre-caching of data)
  • Space tabs with the information you find in the Activity plugin
  • Report Widgets and Macros for them (using {chart} works well enough atm)
  • An option not to generate data and to only use the cache?
  • More system or non-event state information - such as the number of threads used, and a popular one: page generation time.
  • Get more events into Confluence (e.g. on RSS read) and improve the data available through the events (e.g. set the request & response on ServletActionContext in events originating outside of xwork - like attachment downloads)
  • Add some analytics which can identify wiki patterns (such as wiki gnomes, heavy users etc)

Version History

Version Date State License Price
Show description 1.0 (#1) 09 May 2008 Stable Freeware / Open Source (BSD) Donate

Contributors

Screenshots

Other Adaptavist Entries

Synonym Plugin — A search extractor for Confluence to inject synonyms for acronyms, words or phrases into the index to aid with searching
Ranking Macro — Yet another macro for voting/rating/ranking pages, this one is uniquely different to the others by providing a macro for ranking pages with a 'was this page useful' style approach, tracking only positive answers
Insert Picture Plugin — A in-place image management widget for Confluence to help with image attachment manipulation
Custom News — An alternative to Confluence's blog posts macro to aid with customisation
User Security Management Plugin — An enhancement for the Confluence user management system, to prompt better security practices - including email verification and admin vetting of signups
Plugin Message Client — A library which when included as an extracted dependancy will allow java communication between the classloaders of the installed plugins
Attachment Download Plugin — Adds a servlet so you can download attachments from a page without needing to know the ID.
Statistical Analysis Plugin — Confluence has lacked a cluster-ready, enterprise scaleable, remotely accessible statistically gathering and analysis plugin ... not any more!

Labels

codegeist_2008_vendor_adaptavist codegeist_2008_vendor_adaptavist Delete
codegeist_2008_confluence codegeist_2008_confluence Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. May 09, 2008

    Dan Hardiker says:

    It's not yet 11am GMT-10, but the page has been locked? Ah well ... I've just c...

    It's not yet 11am GMT-10, but the page has been locked?

    Ah well ... I've just created a MemoryUsageWidget which you can use in the form, or similar:

    {chart:type=line|width=650}
    {statsDataWidget:widget=MemoryUsageWidget|intervalType=6|intervalCount=1|theme=horizontal|intervalDateFormatter=dd/MM}
    {chart}
    

    It was mainly done as a demonstration of how quickly you can add new widgets which process data in the current forms. This particular widget is very useful for people that don't have a dedicated monitoring platform and want to monitor their memory usage over periods of time.

  2. Jul 22

    Dimitar Dimitrov says:

    I'm experiencing STATS-29 with Confluence 2.8.0 on PostgreSQL, Solaris 10. Let...

    I'm experiencing STATS-29 with Confluence 2.8.0 on PostgreSQL, Solaris 10.

    Let me know if you ned additional information.

  3. Oct 14

    David Matsumoto says:

    Installation of the plug-in goes fine, but it is apparently having trouble stori...

    Installation of the plug-in goes fine, but it is apparently having trouble storing the data. I'm guessing the issue is that it may have tried to setup the proper tables and views and couldn't. Is it possible to get a copy of the schema so it can be manually implemented? I might be able to extract it from the code, but that seemed to be a bit dangerous. We are in an Enterprise environment and ids that are capable of modifying the schema aren't generally allowed for use in applications. I believe we are using Oracle10G and are using Confluence 2.7.3 if that helps.

    We had exceptions occur with the Database Tools and Data Widget Runner actions, if that helps. Both referenced non-existent tables or views.

    Any support you could provide would be appreciated.

    1. Oct 14

      David Peterson [CustomWare] says:

      I found a couple of bugs with it also when I tried installing with HSQL. One was...

      I found a couple of bugs with it also when I tried installing with HSQL. One was specific to that database, a couple of others were more general bugs (a bad group by query, etc). I've checked my fixes into SVN, but I don't think Adaptavist has cut a new release since then. Try grabbing the latest code from SVN, building it, and seeing if that helps...

      1. Oct 14

        David Matsumoto says:

        Are the tables created by the plugin when it is installed or are you saying that...

        Are the tables created by the plugin when it is installed or are you saying that the bugs prevented it from accessing some of the tables correctly? I took a brief look at the code and found a few of the INSERT calls, but I didn't see any CREATE TABLE entries, either I missed them or the tables already exist in Confluence. Right now we are working under the assumption that the tables are unique to this plugin and thus are unlikely to have been created correctly, if that isn't the case then I'll be glad to download the current build and recompile.

        Thanks for the suggestion

        1. Oct 14

          Dan Hardiker says:

          Table management is done using http://www.liquibase.org/ I plan on getting arou...

          Table management is done using http://www.liquibase.org/

          I plan on getting around to adding support for Oracle 10g in the coming months through other commissioned work.

          1. Oct 15

            David Matsumoto says:

            Thank you very much. I was initially a little confused by the response, but once...

            Thank you very much. I was initially a little confused by the response, but once I realized the definitions were XML, I found everything I needed including the field definitions, etc.

            I'm unsure if the issue is really Oracle 10g incompatibility or simply a permissions issue due to our environment. I'll talk to our admins and we will provide more feedback when we know more. I'll have to take a better look at the implementation to see if anything definitely won't work with Oracle. The biggest concern would be how the autoincrement is handled, I'll have to look at how LiquiBase handles that. The rest looked pretty standard and simple.

            I did confirm we are using Oracle 10g.

            Thanks again.

            1. Oct 22

              David Matsumoto says:

              Just an update on the situation. I did confirm that the id being used by conflu...

              Just an update on the situation. I did confirm that the id being used by confluence cannot create new tables or indices so that was contributing to part of the issue. I also confirmed that the SQL generated by liquibase, using the commandline and updateSQL, wasn't really compatible with Oracle particularly in the numeric types and the autoincrement (reported as unsupported). I made the translation for the field types (not the autoincrement yet) and created an instance. However when I tried to connect it to the database and save the settings I got an exception.

              com.thoughtworks.xstream.converters.reflection.ObjectAccessException: Invalid final field com.adaptavist.confluence.stats.model.StatsConfig.EVENT_DATABASE_TABLE
              at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.validateFieldAccess(PureJavaReflectionProvider.java:150)
              

              I can provide more information directly if required. I'm unsure if this was due to my lack of autoincrement support or some sort of validation check that is being run and failing. I didn't implement all of the liquibase tables, is this required as well? I had delayed the autoincrement since I could implement in two ways, either use straight sequences and modify the insertion SQL slightly to handle or create sequences and triggers. Any preference from your side?

              I know you haven't planned to work Oracle 10g compatibility yet, but as you can tell we are interested in making this work so we can help more with the widget side of things.

              Thanks for the support

  4. Oct 16

    Martin Mrazek says:

    Hi, above all, thanks for the wonderful plugin! We were exploring a bit the p...

    Hi, above all, thanks for the wonderful plugin!

    We were exploring a bit the plugin_stats_data table and the set of events being logged. It's fine, that the search event is logged (eventType=18). Unfortunately there is no logging when the user looks up documents using menu space > labels.

    Is there a way how to extend the set of logged events also for this case?

    It seems rather important, because since Confluence 2.8 onwards users can easily look for documents having specified label(s). Thus the search according labels is becoming pretty popular among our customers.

    thanks
    Martin