Configuring Database Character Encoding

The database used with Confluence should be configured to use the same character encoding as Confluence. The recommended encoding is Unicode UTF-8 (the equivalent for Oracle databases is AL32UTF8).

There are two places where character encoding may need to be configured:

  • when creating the database
  • when connecting to the database (JDBC connection URL or properties).

The configuration details for each type of database are different. Some examples are below.

JDBC connection settings

MySQL

Append "useUnicode=true to your JDBC URL:

jdbc:mysql://hostname:port/database?useUnicode=true&characterEncoding=utf8

If you are modifying confluence.cfg.xml directly rather than via the Confluence Installation GUI, you'll need to escape out the & in the URL string as this is a reserved XML token and will break the syntax when the XML is parsed. An effective URL could be similar to:

<property name="hibernate.connection.url">jdbc:mysql://hostname:port/database?useUnicode=true&amp;characterEncoding=utf8</property>

Creating a UTF-8 database

MySQL

  1. Create a UTF-8 database with binary UTF-8 collation.
    (info)Binary UTF-8 provides case-sensitive collation.

    CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_bin;
    
  2. You will also need to set the Server Characterset to utf8. This can be done by adding the following in my.ini for Windows or my.cnf for other OS. It has to be declared in the Server section, which is the section after [mysqld]:

    [mysqld]
    default-character-set=utf8
    


    If the above option does not work, try using character_set_server=utf8 in lieu of default-character-set=utf8

  3. Use the status command to verify database character encoding information.

    Screenshot: Using the Status Command to Verify Database Character Encoding

  4. In some cases, the individual tables collation and character encoding may differ from the one that the database as a whole has been configured to use. Please use the command below to ensure all tables within your Confluence database are correctly configured to use UTF-8 character encoding and binary UTF-8 collation:

    use confluence;
    show table status;
    

    Check for the value listed under the Collation column, to ensure it has been set to utf8_bin (that is, case-sensitive) collation for all tables.
    If not, then this can be changed by the following command, executed for each table in the Confluence database:

    ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
    

    Please substitute the <tablename> above, with each table within the confluence database.

Relevant MySQL manual for more detailed explanation:

PostgreSQL

CREATE DATABASE confluence WITH ENCODING 'UNICODE';

Or from the command-line:

$ createdb -E UNICODE confluence

For more information see the PostgreSQL documentation.

For PostgreSQL running under Windows

Please note that international characters sets are only fully supported and functional when using PostgreSQL 8.1 and above under Microsoft Windows.

For PostgreSQL running under Linux

Please make sure you check the following to ensure proper handling of international characters in your database

When PostgreSQL creates an initial database cluster, it sets certain important configuration options based on the host enviroment. The command responsible for creating the PostgreSQL environment initdb will check environment variables such as LC_CTYPE and LC_COLLATE (or the more general LC_ALL) for settings to use as database defaults related to international string handling. As such it is important to make sure that your PostgreSQL environment is configured correctly before you install Confluence.

To do this, connect to your PostgreSQL instance using pgsql and issue the following command:

SHOW LC_CTYPE;

If LC_CTYPE is set to either "C" or "POSIX" then certain string functions such as converting to and from upper and lower case will not work correctly with international characters. Correct settings for this value take the form <LOCALE>.<ENCODING> (en_AU.UTF8 for example).

If your LC_CTYPE is incorrect please check the PostgreSQL documentation for information on configuring database localisation. It is not easy to change these settings with a database that already contains data.

Updating existing database to UTF-8

MySQL database with existing data

For an existing database

If you're using a existing database, confirm the Character Encoding by executing the query:
SHOW VARIABLES LIKE 'character%'; and SHOW VARIABLES LIKE 'collation%';.
The results should be UTF-8.

Before proceeding with the following changes, please backup your database.

This example shows how to change your database from latin1 to utf8, where your database is named "confluence".

  1. Dump the database (except the plugindata table) to a text file using the mysqldump tool from the command-line :
    mysqldump -p --default-character-set=latin1 -u <username> --skip-set-charset --ignore-table='`confluence`.plugindata' --ignore-table='`confluence`.attachmentdata' confluence > confluence_database.sql
  2. Dump the plugindata and attachmentdata tables to a text file using mysqldump separately. This is done separately as the recode step below can corrupt the binary data in these tables :
    mysqldump -p --default-character-set=latin1 -u <username> --skip-set-charset confluence attachmentdata plugindata > confluence_blobtables.sql
  3. copy confluence_database.sql to confluence_utf8.sql
  4. Open confluence_utf8.sql in a text editor and change all character sets from 'latin1' to 'utf8'
  5. Encode all the latin1 characters as UTF-8:
    recode latin1..utf8 confluence_utf8.sql (the recode utility is described at http://directory.fsf.org/recode.html; it can actually be downloaded from http://recode.progiciels-bpi.ca/, and is available for Ubuntu via apt-get)

In MySQL:

  1. DROP DATABASE confluence;
  2. CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_bin;

Reimport the UTF-8 text file, and also the plugindata and attachmentdata dumps:

  1. mysql -u <username> -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql
  2. mysql -u <username> -p --default-character-set=latin1 --max_allowed_packet=64M confluence < /home/confluence/confluence_blobtables.sql

To support large imports, the parameter '--max_allowed_packet=64M' used above sets the maximum size of an SQL statement to be very large. In some circumstances, you may need to increase it further, especially if attachments are stored in the database.

Finally, since the plugindata and attachmentdata tables were not actually converted before, you need to instruct MySQL to convert all the text fields manually:

  1. ALTER TABLE plugindata CONVERT TO CHARACTER SET utf8;
  2. ALTER TABLE attachmentdata CONVERT TO CHARACTER SET utf8;

Testing database encoding

See Troubleshooting Character Encodings for a number of tests you can run to ensure your database encoding is correct.

Last modified on Dec 2, 2015

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.