This server will be upgraded at 3pm Sydney time on December 3rd (December 2nd, 8pm PST) and will be down for up to 30 minutes.
This documentation relates to the latest version of Confluence.
If you are using an earlier version, please go to the documentation home page and select the relevant version.

Configuring Database Character Encoding

All Versions
Click for all versions
Confluence 2.10 Documentation

Index

The database used with Confluence should be configured to use the same character encoding as Confluence. The recommended encoding is Unicode UTF-8.

There are two places where character encoding may need to be configured:

  • when creating the database
  • when connecting to the database (JDBC connection URL or properties).

The configuration details for each type of database are different. Some examples are below.

JDBC connection settings

MySQL

Append "useUnicode=true to your JDBC URL:

jdbc:mysql://hostname:port/database?autoReconnect=true&useUnicode=true&characterEncoding=utf8

Creating a UTF-8 database

MySQL

CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;

Use the status command to verify database character encoding information.

For more information see the MySQL documentation.

PostgreSQL

CREATE DATABASE confluence WITH ENCODING 'UNICODE';

Or from the command-line:

$ createdb -E UNICODE confluence

For more information see the PostgreSQL documentation.

For PostgreSQL running under Windows

Please note that international characters sets are only fully supported and functional when using PostgreSQL 8.1 and above under Microsoft Windows.

For PostgreSQL running under Linux

Please make sure you check the following to ensure proper handling of international characters in your database

When PostgreSQL creates an initial database cluster, it sets certain important configuration options based on the host enviroment. The command responsible for creating the PostgreSQL environment initdb will check environment variables such as LC_CTYPE and LC_COLLATE (or the more general LC_ALL) for settings to use as database defaults related to international string handling. As such it is important to make sure that your PostgreSQL environment is configured correctly before you install Confluence.

To do this, connect to your PostgreSQL instance using pgsql and issue the following command:

SHOW LC_CTYPE;

If LC_CTYPE is set to either "C" or "POSIX" then certain string functions such as converting to and from upper and lower case will not work correctly with international characters. Correct settings for this value take the form <LOCALE>.<ENCODING> (en_AU.UTF8 for example).

If your LC_CTYPE is incorrect please check the PostgreSQL documentation for information on configuring database localisation. It is not easy to change these settings with a database that already contains data.

Updating existing database to UTF-8

MySQL database with existing data

Before proceeding with the following changes, please backup your database.

This example shows how to change your database from latin1 to utf8.

  1. Dump the database to a text file using mysqldump tool from the command-line :
    mysqldump -p --default_character-set=latin1 -u <username> --skip-set-charset confluence > confluence_database.sql
  2. Open the SQL file in a text editor and change all character sets from 'latin1' to 'utf8'
  3. cp confluence_database.sql confluence_utf8.sql
  4. Encode all the latin1 characters as UTF-8:
    recode latin1..utf8 confluence_utf8.sql (Recode utility available from http://directory.fsf.org/recode.html)

In MySQL:

  1. DROP DATABASE confluence;
  2. CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;

Finally, reimport the UTF-8 text file:

  1. mysql -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

To support large imports, the parameter '--max_allowed_packet=64M' used above sets the maximum size of an SQL statement to be very large. In some circumstances, you may need to increase it further, especially if attachments are stored in the database.

Testing database encoding

See Troubleshooting Character Encodings for a number of tests you can run to ensure your database encoding is correct.

Related Documentation

Known Issues for MySQL

Labels

database database Delete
encoding encoding Delete
postgresql postgresql Delete
mysql mysql Delete
confluence confluence Delete
unicode unicode Delete
utf8 utf8 Delete
db-setup db-setup Delete
known-issues known-issues Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jul 04, 2006

    Christoph Seyfert says:

    I had a problem with the character encoding. I have added &characterEncodin...

    I had a problem with the character encoding.

    I have added &characterEncoding=UTF-8 to the JDBC-URL to solve the problem.

    Thanks to the Atlassian support team.

    1. Jul 05, 2006

      Matt Ryall (Atlassian) says:

      This is necessary with MySQL when the server's encoding is not UTF-8 and the dat...

      This is necessary with MySQL when the server's encoding is not UTF-8 and the database's is UTF-8. Regardless of the database encoding, it appears the MySQL JDBC drivers use the server's encoding for certain operations.

      If you have the choice, change the server's encoding to UTF-8 instead.

      You can check your server and database encoding with the status command in MySQL.

    2. Nov 16, 2007

      Anonymous says:

      you saved my life with simple change in jdbc my unicode support finally started ...

      you saved my life with simple change in jdbc my unicode support finally started working

    3. Nov 16, 2007

      Anonymous says:

      thank you, that works

      thank you, that works

  2. Jul 18, 2006

    Dan Hardiker says:

    Quoting: Finally, reimport the UTF-8 text file: mysql -p --default-character-...

    Quoting:

    Finally, reimport the UTF-8 text file:

    1. mysql -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

    For large imports, add 'max_allowed_packet=32M' under [mysqld] in /etc/my.cnf.

    Is that right? For larger imports you want to try and reduce the maximum allowed packet size compared to the standard suggested command? Wouldn't the command line override the /etc/my.cnf file anyway?

    1. Jul 19, 2006

      Matt Ryall (Atlassian) says:

      Thanks for spotting that, Dan. I've updated it to be a bit clearer.

      Thanks for spotting that, Dan. I've updated it to be a bit clearer.

  3. Aug 03, 2006

    Bo Song says:

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql...

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql 2000 database with utf-8 encoding? I create a database using default setting, it failed in "Charactor Encoding test". I feel that Ms use COLLATE instead of encoding in terminology but I can't seem to find Unicode or UTF8 as an option.

     Please advise, thanks

    1. Aug 04, 2006

      Matt Ryall (Atlassian) says:

      Microsoft SQL Server supports Unicode by default in new databases, but you may n...

      Microsoft SQL Server supports Unicode by default in new databases, but you may need to fix your collation settings so the case-sensitivity test doesn't fail.

      I noticed you raised a support case for this issue. We will respond to you there.

      1. Sep 04, 2006

        David Soul [Atlassian] says:

        The cause of this issue has been patched. See CONF-6742 for details.

        The cause of this issue has been patched. See CONF-6742 for details.

  4. Apr 12, 2007

    Mark says:

    MSSQL Can you break out a new MSSQL section under Creating a UTF-8 Database (sa...

    MSSQL

    Can you break out a new MSSQL section under Creating a UTF-8 Database (saying it's default on new db creation) so that people looking for MSSQL information have a 'section' to look at.

    I barely caught your comments item here. Thanks

  5. Jul 14, 2007

    arno schmacher says:

    PostgreSQL and German Umlaut hint: I run into some problems visualisation / sor...

    PostgreSQL and German Umlaut hint:

    I run into some problems visualisation / sorting of data containing German Umlauts. The sql lower function and the sorting was broken. Createing only the database as proposed above did not work on my site.

    I solved the problem by specifying the enocding de_DE.UTF-8 while creating the database cluster using the initdb command. You should not miss the UTF-8 part!

    initdb -U arno -W /server/database --lc-ctype=de_DE.UTF-8 --lc-collate=de_DE.UTF-8 
    
    pg_ctl -D /server/database -l logfile start
    createdb confluence -E UNICODE
    

    For more information please consult the Postgres Documentation

  6. Jun 26

    Anonymous says:

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql...

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql 2000 database with utf-8 encoding? I create a database using default setting, it failed in "Charactor Encoding test". I feel that Ms use COLLATE instead of encoding in terminology but I can't seem to find Unicode or UTF8 as an option. <a href="http://www.liga-consulting.com.ua">to find Unicode or UTF8 as an option.</a>

    1. Jun 27

      Tony Cheah Tong Nyee says:

      Hi there, You may be interested to refer to the following page regarding some k...

      Hi there,

      You may be interested to refer to the following page regarding some known Unicode issue when using MS SQL Server with Confluence:

      If you are still encountering some problems related to character encoding, feel free to raise a support issue at:

      From there, the support engineer will help to look into it further.

      Cheers,
      Tony

  7. Jul 31

    Anonymous says:

    When debugging some utf-8 encoding problem (it was related to mod_jk and Tomcat,...

    When debugging some utf-8 encoding problem (it was related to mod_jk and Tomcat, I had the following effect:

    Database is already at UTF-8:

    mysql> status;
    --------------
    ...
    Connection:             Localhost via UNIX socket
    Server characterset:    utf8
    Db     characterset:    utf8
    Client characterset:    utf8
    Conn.  characterset:    utf8
    ...
    

    If I append either "&useUnicode=true&characterEncoding=utf8" or "&useUnicode=true" to the jdbc connection URL (line 22 of confluence.cfg.xml) as recommended tomcat won't start:

    You cannot access Confluence at present. Look at the table below to identify the reasons

    Time Level Type Description Exception
    <timestamp> (EventLevel: fatal) (EventType: bootstrap) Could not load bootstrap from environment No server id found. com.atlassian.config.bootstrap.BootstrapException: Unable to bootstrap application: Failed to parse config file: Error on line 22 of document : The reference to entity "useUnicode" must end with the ';' delimiter. Nested exception: The reference to entity "useUnicode" must end with the ';' delimiter.
    I thus had to omit the parameters (which was no problem since the connection is already utf-8).

    1. Jul 31

      Azwandi Mohd Aris says:

      Hi there, Would you be able to raise a support request at http://support.atlass...

      Hi there,

      Would you be able to raise a support request at http://support.atlassian.com? Please attach all of this information, your logs and system information to that ticket. Thanks.

      Cheers,
      Azwandi

  8. Nov 20

    Chris Latimer says:

    Are there any issues using an Oracle database with AL32UTF8 instead of UTF8?&nbs...

    Are there any issues using an Oracle database with AL32UTF8 instead of UTF8?  From what I've read it seems like it should work, but would like to verify before we purchase Confluence.

    Chris Latimer

    1. Nov 23

      James Fleming [Atlassian] says:

      Chris, I haven't tested this myself, but it appears that AL32UTF8 implements UT...

      Chris,

      I haven't tested this myself, but it appears that AL32UTF8 implements UTF8 properly, and would thus actually be a better idea than Oracle's UTF8, which sort-of-mostly implements it correctly. It seems that even "Oracle recommends that customers switch to AL32UTF8 for full supplementary character support," according to this article. There's a short summary here.

      Regards,
      James Fleming

Add Comment