This documentation relates to the latest version of Confluence.
If you are using an earlier version, please go to the documentation home page and select the relevant version.

Configuring Database Character Encoding

All Versions

Confluence 3.0 Documentation

On this page:

The database used with Confluence should be configured to use the same character encoding as Confluence. The recommended encoding is Unicode UTF-8.

There are two places where character encoding may need to be configured:

  • when creating the database
  • when connecting to the database (JDBC connection URL or properties).

The configuration details for each type of database are different. Some examples are below.

JDBC connection settings

MySQL

Append "useUnicode=true to your JDBC URL:

jdbc:mysql://hostname:port/database?autoReconnect=true&useUnicode=true&characterEncoding=utf8

Creating a UTF-8 database

MySQL

  1. Create a UTF-8 database
    CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;
    
  2. You will also need to set the Server Characterset to utf8. This can be done by adding the following in my.ini for Windows or my.cnf for other OS. It has to be declared in the Server section, which is the section after [mysqld]:
    [mysqld]
    default-character-set=utf8
    
  3. Use the status command to verify database character encoding information.
  4. In some cases, the individual tables collation and character encoding may differ from the one that the database as a whole has been configured to use. Please use the command below to ensure all tables within your Confluence database are correctly configured to use UTF-8 character encoding and collation:
    use confluence;
    show table status;
    

    Check for the value listed under the Collation column, to ensure it has been set to utf8_general_ci for all tables.
    If not, then this can be changed by the following command, executed for each table in the Confluence database:

    ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
    

    Please substitute the <tablename> above, with each table within the confluence database.

Relevant MySQL manual for more detailed explanation:

PostgreSQL

CREATE DATABASE confluence WITH ENCODING 'UNICODE';

Or from the command-line:

$ createdb -E UNICODE confluence

For more information see the PostgreSQL documentation.

For PostgreSQL running under Windows

Please note that international characters sets are only fully supported and functional when using PostgreSQL 8.1 and above under Microsoft Windows.

For PostgreSQL running under Linux

Please make sure you check the following to ensure proper handling of international characters in your database

When PostgreSQL creates an initial database cluster, it sets certain important configuration options based on the host enviroment. The command responsible for creating the PostgreSQL environment initdb will check environment variables such as LC_CTYPE and LC_COLLATE (or the more general LC_ALL) for settings to use as database defaults related to international string handling. As such it is important to make sure that your PostgreSQL environment is configured correctly before you install Confluence.

To do this, connect to your PostgreSQL instance using pgsql and issue the following command:

SHOW LC_CTYPE;

If LC_CTYPE is set to either "C" or "POSIX" then certain string functions such as converting to and from upper and lower case will not work correctly with international characters. Correct settings for this value take the form <LOCALE>.<ENCODING> (en_AU.UTF8 for example).

If your LC_CTYPE is incorrect please check the PostgreSQL documentation for information on configuring database localisation. It is not easy to change these settings with a database that already contains data.

Updating existing database to UTF-8

MySQL database with existing data

Before proceeding with the following changes, please backup your database.

This example shows how to change your database from latin1 to utf8.

  1. Dump the database to a text file using mysqldump tool from the command-line :
    mysqldump -p --default_character-set=latin1 -u <username> --skip-set-charset confluence > confluence_database.sql
  2. copy confluence_database.sql to confluence_utf8.sql
  3. Open confluence_utf8.sql in a text editor and change all character sets from 'latin1' to 'utf8'
  4. Encode all the latin1 characters as UTF-8:
    recode latin1..utf8 confluence_utf8.sql (the recode utility is described at http://directory.fsf.org/recode.html; it can actually be downloaded from http://recode.progiciels-bpi.ca/, and is available for Ubuntu via apt-get)

In MySQL:

  1. DROP DATABASE confluence;
  2. CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;

Finally, reimport the UTF-8 text file:

  1. mysql -u <username> -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

To support large imports, the parameter '--max_allowed_packet=64M' used above sets the maximum size of an SQL statement to be very large. In some circumstances, you may need to increase it further, especially if attachments are stored in the database.

Testing database encoding

See Troubleshooting Character Encodings for a number of tests you can run to ensure your database encoding is correct.

RELATED TOPICS:

Character encodings in Confluence
Known Issues for MySQL

Labels

database database Delete
encoding encoding Delete
postgresql postgresql Delete
mysql mysql Delete
confluence confluence Delete
unicode unicode Delete
utf8 utf8 Delete
db-setup db-setup Delete
known-issues known-issues Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jul 04, 2006

    Christoph Seyfert says:

    I had a problem with the character encoding. I have added &characterEncodin...

    I had a problem with the character encoding.

    I have added &characterEncoding=UTF-8 to the JDBC-URL to solve the problem.

    Thanks to the Atlassian support team.

    1. Jul 05, 2006

      Matt Ryall (Atlassian) says:

      This is necessary with MySQL when the server's encoding is not UTF-8 and the dat...

      This is necessary with MySQL when the server's encoding is not UTF-8 and the database's is UTF-8. Regardless of the database encoding, it appears the MySQL JDBC drivers use the server's encoding for certain operations.

      If you have the choice, change the server's encoding to UTF-8 instead.

      You can check your server and database encoding with the status command in MySQL.

    2. Nov 16, 2007

      Anonymous says:

      you saved my life with simple change in jdbc my unicode support finally started ...

      you saved my life with simple change in jdbc my unicode support finally started working

    3. Nov 16, 2007

      Anonymous says:

      thank you, that works

      thank you, that works

  2. Jul 18, 2006

    Dan Hardiker says:

    Quoting: Finally, reimport the UTF-8 text file: mysql -p --default-character-...

    Quoting:

    Finally, reimport the UTF-8 text file:

    1. mysql -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql

    For large imports, add 'max_allowed_packet=32M' under [mysqld] in /etc/my.cnf.

    Is that right? For larger imports you want to try and reduce the maximum allowed packet size compared to the standard suggested command? Wouldn't the command line override the /etc/my.cnf file anyway?

    1. Jul 19, 2006

      Matt Ryall (Atlassian) says:

      Thanks for spotting that, Dan. I've updated it to be a bit clearer.

      Thanks for spotting that, Dan. I've updated it to be a bit clearer.

  3. Aug 03, 2006

    Bo Song says:

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql...

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql 2000 database with utf-8 encoding? I create a database using default setting, it failed in "Charactor Encoding test". I feel that Ms use COLLATE instead of encoding in terminology but I can't seem to find Unicode or UTF8 as an option.

     Please advise, thanks

    1. Aug 04, 2006

      Matt Ryall (Atlassian) says:

      Microsoft SQL Server supports Unicode by default in new databases, but you may n...

      Microsoft SQL Server supports Unicode by default in new databases, but you may need to fix your collation settings so the case-sensitivity test doesn't fail.

      I noticed you raised a support case for this issue. We will respond to you there.

      1. Sep 04, 2006

        David Soul [Atlassian] says:

        The cause of this issue has been patched. See CONF-6742 for details.

        The cause of this issue has been patched. See CONF-6742 for details.

  4. Apr 12, 2007

    Mark says:

    MSSQL Can you break out a new MSSQL section under Creating a UTF-8 Database (sa...

    MSSQL

    Can you break out a new MSSQL section under Creating a UTF-8 Database (saying it's default on new db creation) so that people looking for MSSQL information have a 'section' to look at.

    I barely caught your comments item here. Thanks

  5. Jul 14, 2007

    arno schmacher says:

    PostgreSQL and German Umlaut hint: I run into some problems visualisation / sor...

    PostgreSQL and German Umlaut hint:

    I run into some problems visualisation / sorting of data containing German Umlauts. The sql lower function and the sorting was broken. Createing only the database as proposed above did not work on my site.

    I solved the problem by specifying the enocding de_DE.UTF-8 while creating the database cluster using the initdb command. You should not miss the UTF-8 part!

    initdb -U arno -W /server/database --lc-ctype=de_DE.UTF-8 --lc-collate=de_DE.UTF-8 
    
    pg_ctl -D /server/database -l logfile start
    createdb confluence -E UNICODE
    

    For more information please consult the Postgres Documentation

  6. Jun 26, 2008

    Anonymous says:

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql...

    I am setting up confluence to use Ms SQL 2000 server. How do I create a sql 2000 database with utf-8 encoding? I create a database using default setting, it failed in "Charactor Encoding test". I feel that Ms use COLLATE instead of encoding in terminology but I can't seem to find Unicode or UTF8 as an option. <a href="http://www.liga-consulting.com.ua">to find Unicode or UTF8 as an option.</a>

    1. Jun 27, 2008

      Tony Cheah Tong Nyee says:

      Hi there, You may be interested to refer to the following page regarding some k...

      Hi there,

      You may be interested to refer to the following page regarding some known Unicode issue when using MS SQL Server with Confluence:

      If you are still encountering some problems related to character encoding, feel free to raise a support issue at:

      From there, the support engineer will help to look into it further.

      Cheers,
      Tony

  7. Jul 31, 2008

    Anonymous says:

    When debugging some utf-8 encoding problem (it was related to mod_jk and Tomcat,...

    When debugging some utf-8 encoding problem (it was related to mod_jk and Tomcat, I had the following effect:

    Database is already at UTF-8:

    mysql> status;
    --------------
    ...
    Connection:             Localhost via UNIX socket
    Server characterset:    utf8
    Db     characterset:    utf8
    Client characterset:    utf8
    Conn.  characterset:    utf8
    ...
    

    If I append either "&useUnicode=true&characterEncoding=utf8" or "&useUnicode=true" to the jdbc connection URL (line 22 of confluence.cfg.xml) as recommended tomcat won't start:

    You cannot access Confluence at present. Look at the table below to identify the reasons

    Time Level Type Description Exception
    <timestamp> (EventLevel: fatal) (EventType: bootstrap) Could not load bootstrap from environment No server id found. com.atlassian.config.bootstrap.BootstrapException: Unable to bootstrap application: Failed to parse config file: Error on line 22 of document : The reference to entity "useUnicode" must end with the ';' delimiter. Nested exception: The reference to entity "useUnicode" must end with the ';' delimiter.

    I thus had to omit the parameters (which was no problem since the connection is already utf-8).

    1. Jul 31, 2008

      Azwandi Mohd Aris says:

      Hi there, Would you be able to raise a support request at http://support.atlass...

      Hi there,

      Would you be able to raise a support request at http://support.atlassian.com? Please attach all of this information, your logs and system information to that ticket. Thanks.

      Cheers,
      Azwandi

    2. Dec 13, 2008

      Anonymous says:

      This is an XML file that you're updating; an ampersand is a special character. T...

      This is an XML file that you're updating; an ampersand is a special character. Try this:

      "&useUnicode=true&characterEncoding=utf8"

      1. Dec 13, 2008

        Anonymous says:

        Heh ... one more time (sorry for the above post): &amp;useUnicode=true&...

        Heh ... one more time (sorry for the above post):

        &amp;useUnicode=true&amp;characterEncoding=utf8
        
  8. Nov 20, 2008

    Chris Latimer says:

    Are there any issues using an Oracle database with AL32UTF8 instead of UTF8?&nbs...

    Are there any issues using an Oracle database with AL32UTF8 instead of UTF8?  From what I've read it seems like it should work, but would like to verify before we purchase Confluence.

    Chris Latimer

    1. Nov 23, 2008

      James Fleming [Atlassian] says:

      Chris, I haven't tested this myself, but it appears that AL32UTF8 implements UT...

      Chris,

      I haven't tested this myself, but it appears that AL32UTF8 implements UTF8 properly, and would thus actually be a better idea than Oracle's UTF8, which sort-of-mostly implements it correctly. It seems that even "Oracle recommends that customers switch to AL32UTF8 for full supplementary character support," according to this article. There's a short summary here.

      Regards,
      James Fleming

  9. Feb 25

    Robin Chow says:

    Just a bit confused; it's written that the recommended encoding is Unicode UTF-8...

    Just a bit confused; it's written that the recommended encoding is Unicode UTF-8, but the instructions on creating the database is:

    CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;
    

    Which one should it be?

    1. Feb 26

      Zed Yap [Atlassian] says:

      Hi Robin, From the code: CREATE DATABASE confluence CHARACTER SET utf8 COLL...

      Hi Robin,

      From the code:

      CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_general_ci;
      

      You could just use that command to create a database called confluence using encoding utf 8. Collation refers to a set of rules that determine how data is sorted and compared. To know more about collate please refer on this document:

      http://www.databasejournal.com/features/mssql/article.php/3302341/SQL-Server-and-Collation.htm

      If I have misinterpreted your question, feel free to add your comment here.

      Hope that helps.

      Best rgds,
      Zed

Add Comment


Except where otherwise noted, content in this space is licensed under a Creative Commons Attribution 2.5 Australia License.