This documentation relates to the latest version of Confluence.
If you are using an earlier version, please go to the documentation home page and select the relevant previous version.

Running Confluence Behind a Caching Proxy Server

Confluence 2.8 Documentation

Index

One major concern is Confluence's ability to withstand a Slashdot, and someone told us that Atlassian had basically said that Confluence could not handle the load of such an event/attack.

Ideally I would want to put a Squid cache directly infront of Conflunce, set the default policy to cache content of normal pages for ~5 minutes (at least) and then pass-through more of the dynamic pages (like the editor & such).

This is, in fact, the case. We don't have any deployed Confluence sites that have the requirement of being Slashdot-proof, but this is probably one of those chicken-and-egg things.

The problem is not one of simple scaleability. We're currently working on "Confluence Massive", a clusterable Confluence that will scale to handle whatever load you feel like throwing at it. But if your aim is to protect the server against sudden, transient loads, throwing a cluster at the problem that will then spend 99% of its time not being utilised is probably a waste. Thus, the best solution is to have some kind of caching reverse-proxy that will divert load away from Confluence itself.

The main problem with the reverse-proxy solution is that every Confluence page is built dynamically for whichever user is currently accessing it. This affects obvious stuff like the "You are logged in as username" notice, less obvious stuff like the "edit" and "attachments" links that appear or disappear based on whether the user has permission to perform the action on the other end of the link, and even less obvious stuff like wiki-links to spaces the user can't see, or in-page macros that output their content based on the user's identity.

To run Confluence behind a caching reverse-proxy, you'd need one of:

  1. A proxy that understood the user's identity, or
  2. A Confluence site that removed all the personalised content for cacheable pages.

If you had (1), you could tell the proxy to cache content only for anonymous users (since all anon content is the same, and to survive a slashdotting you only really have to worry about the sudden influx of non-logged-in users). That said, (1) is quite tricky, as it relies on the existence of some SSO mechanism that both Confluence and Squid can be hooked into. If such a mechanism existed, though, it'd be a really neat solution.

In the absence of SSO, you've got (2), which involves.

  • Theme Confluence so that the 'view page' 'view blog post' and 'view mail' pages contain no personalised content: no profile link or user identity, and all links to other functions available whether the user has permission to access them or not.
  • Ensure that all wiki pages on the server are meant to be visible to anonymous users
  • Disable (or avoid the use of) macros that deliver different content based on user identity
  • Introduce an interceptor into Confluence that would provide If-Modified-Since/Last-Modified conditional get support for wiki pages
  • Configure Confluence so the site root URL points to a page, rather than the dashboard.
  • Configure Squid to cache the 'view page' URLs (/display/* /pages/viewpage.action /pages/viewblogpost.action)

This is assuming that only the site root or a regular wiki page would ever be the victim of a direct slashdotting, but I figure this is a reasonable enough assumption to make.

With conditional get supported, you could have Squid configured to query the server to see if a page has changed, and just put in some kind of sensible defaults for the maximum time to cache any page (5 minutes or so would be fine, since pages could contain dynamic content), and the minimum gap between if-modified queries (15 seconds would easily prevent the server from being overloaded, while making sure that in regular use you wouldn't get many situations where you edited a page, but couldn't see your own changes).

Labels

 
  1. Feb 03, 2006

    Jeff Turner says:

    Let's put this problem in perspective. Confluence does plenty of caching interna...

    Let's put this problem in perspective. Confluence does plenty of caching internally and is quite capable of handling heavy page loads:

    [jturner@atlassian01 jturner]$ ab -c 100 -n 1000 'http://confluence.atlassian.com/display/DOC/Running+Confluence+Behind+a+Caching+Proxy+Server'
    This is ApacheBench, Version 2.0.40-dev <$Revision: 1.121.2.1 $> apache-2.0
    Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/
    
    Benchmarking confluence.atlassian.com (be patient)
    Completed 100 requests
    Completed 200 requests
    Completed 300 requests
    Completed 400 requests
    Completed 500 requests
    Completed 600 requests
    Completed 700 requests
    Completed 800 requests
    Completed 900 requests
    Finished 1000 requests
    
    
    Server Software:        Orion/2.0.2
    Server Hostname:        confluence.atlassian.com
    Server Port:            80
    
    Document Path:          /display/DOC/Running+Confluence+Behind+a+Caching+Proxy+Server
    Document Length:        17771 bytes
    
    Concurrency Level:      100
    Time taken for tests:   134.464524 seconds
    Complete requests:      1000
    Failed requests:        0
    Write errors:           0
    Total transferred:      18132000 bytes
    HTML transferred:       17771000 bytes
    Requests per second:    7.44 [#/sec] (mean)
    Time per request:       13446.452 [ms] (mean)
    Time per request:       134.465 [ms] (mean, across all concurrent requests)
    Transfer rate:          131.69 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.0      0       0
    Processing:  2841 12922 9676.2  10331   79456
    Waiting:     2841 12918 9676.4  10328   79456
    Total:       2841 12922 9676.2  10331   79456
    
    Percentage of the requests served within a certain time (ms)
      50%  10331
      66%  14326
      75%  16876
      80%  18574
      90%  25698
      95%  33259
      98%  40656
      99%  44256
     100%  79456 (longest request)
    

Add Comment