Configuring Database Character Encoding
The database used with Confluence should be configured to use the same character encoding as Confluence. The recommended encoding is Unicode UTF-8 (the equivalent for Oracle databases is AL32UTF8).
There are two places where character encoding may need to be configured:
- when creating the database
- when connecting to the database (JDBC connection URL or properties).
The configuration details for each type of database are different. Some examples are below.
JDBC connection settings
MySQL
Append "useUnicode=true to your JDBC URL:
jdbc:mysql://hostname:port/database?useUnicode=true&characterEncoding=utf8
If you are modifying confluence.cfg.xml
directly rather than via the Confluence Installation GUI, you'll need to escape out the & in the URL string as this is a reserved XML token and will break the syntax when the XML is parsed. An effective URL could be similar to:
<property name="hibernate.connection.url">jdbc:mysql://hostname:port/database?useUnicode=true&characterEncoding=utf8</property>
Creating a UTF-8 database
MySQL
Create a UTF-8 database with binary UTF-8 collation.
Binary UTF-8 provides case-sensitive collation.CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_bin;
You will also need to set the
Server Characterset
toutf8
. This can be done by adding the following inmy.ini
for Windows ormy.cnf
for other OS. It has to be declared in the Server section, which is the section after[mysqld]
:[mysqld] default-character-set=utf8
If the above option does not work, try using character_set_server=utf8 in lieu of default-character-set=utf8- Use the
status
command to verify database character encoding information.
Screenshot: Using the Status Command to Verify Database Character Encoding
In some cases, the individual tables collation and character encoding may differ from the one that the database as a whole has been configured to use. Please use the command below to ensure all tables within your Confluence database are correctly configured to use UTF-8 character encoding and binary UTF-8 collation:
use confluence; show table status;
Check for the value listed under the Collation column, to ensure it has been set to
utf8_bin
(that is, case-sensitive) collation for all tables.
If not, then this can be changed by the following command, executed for each table in the Confluence database:ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
Please substitute the <tablename> above, with each table within the confluence database.
Relevant MySQL manual for more detailed explanation:
- Specifying Character Sets and Collations documentation.
- Connection Character Sets and Collations.
- SHOW TABLE STATUS Syntax.
- ALTER TABLE Syntax.
PostgreSQL
CREATE DATABASE confluence WITH ENCODING 'UNICODE';
Or from the command-line:
$ createdb -E UNICODE confluence
For more information see the PostgreSQL documentation.
For PostgreSQL running under Windows
Please note that international characters sets are only fully supported and functional when using PostgreSQL 8.1 and above under Microsoft Windows.
For PostgreSQL running under Linux
Please make sure you check the following to ensure proper handling of international characters in your database
When PostgreSQL creates an initial database cluster, it sets certain important configuration options based on the host enviroment. The command responsible for creating the PostgreSQL environment initdb
will check environment variables such as LC_CTYPE
and LC_COLLATE
(or the more general LC_ALL
) for settings to use as database defaults related to international string handling. As such it is important to make sure that your PostgreSQL environment is configured correctly before you install Confluence.
To do this, connect to your PostgreSQL instance using pgsql
and issue the following command:
SHOW LC_CTYPE;
If LC_CTYPE is set to either "C" or "POSIX" then certain string functions such as converting to and from upper and lower case will not work correctly with international characters. Correct settings for this value take the form <LOCALE>.<ENCODING>
(en_AU.UTF8
for example).
If your LC_CTYPE is incorrect please check the PostgreSQL documentation for information on configuring database localisation. It is not easy to change these settings with a database that already contains data.
Updating existing database to UTF-8
MySQL database with existing data
For an existing database
If you're using a existing database, confirm the Character Encoding by executing the query:
SHOW VARIABLES LIKE 'character%'; and SHOW VARIABLES LIKE 'collation%';.
The results should be UTF-8.
Before proceeding with the following changes, please backup your database.
This example shows how to change your database from latin1 to utf8, where your database is named "confluence".
- Dump the database (except the plugindata table) to a text file using the mysqldump tool from the command-line :
mysqldump -p --default-character-set=latin1 -u <
username
> --skip-set-charset --ignore-table='`confluence`.plugindata' --ignore-table='`confluence`.attachmentdata' confluence > confluence_database.sql
- Dump the plugindata and attachmentdata tables to a text file using mysqldump separately. This is done separately as the recode step below can corrupt the binary data in these tables :
mysqldump -p --default-character-set=latin1 -u <
username
> --skip-set-charset confluence attachmentdata plugindata > confluence_blobtables.sql
- copy
confluence_database.sql
toconfluence_utf8.sql
- Open
confluence_utf8.sql
in a text editor and change all character sets from 'latin1' to 'utf8' - Encode all the latin1 characters as UTF-8:
recode latin1..utf8 confluence_utf8.sql
(the recode utility is described at http://directory.fsf.org/recode.html; it can actually be downloaded from http://recode.progiciels-bpi.ca/, and is available for Ubuntu viaapt-get
)
In MySQL:
DROP DATABASE confluence;
CREATE DATABASE confluence CHARACTER SET utf8 COLLATE utf8_bin;
Reimport the UTF-8 text file, and also the plugindata and attachmentdata dumps:
mysql -u <
username
> -p --default-character-set=utf8 --max_allowed_packet=64M confluence < /home/confluence/confluence_utf8.sql
mysql -u <
username
> -p --default-character-set=latin1 --max_allowed_packet=64M confluence < /home/confluence/confluence_blobtables.sql
To support large imports, the parameter '--max_allowed_packet=64M' used above sets the maximum size of an SQL statement to be very large. In some circumstances, you may need to increase it further, especially if attachments are stored in the database.
Finally, since the plugindata and attachmentdata tables were not actually converted before, you need to instruct MySQL to convert all the text fields manually:
ALTER TABLE plugindata CONVERT TO CHARACTER SET utf8;
ALTER TABLE attachmentdata CONVERT TO CHARACTER SET utf8;
Testing database encoding
See Troubleshooting Character Encodings for a number of tests you can run to ensure your database encoding is correct.