Database migration to PostgreSQL fails - Invalid byte sequence for encoding "UTF8": 0x00

Still need help?

The Atlassian Community is here for you.

Ask the community

Symptoms

  • When migrating Bitbucket Server's datastore to a PostgreSQL database, the following error is shown in the administration web interface:

    Bitbucket Server could not be migrated to the new database. PostgreSQL does not allow null characters (U+0000) in text columns. See the following knowledge base to solve the problem: https://confluence.atlassian.com/x/OwOCKQ
  • When restoring a backup to a Bitbucket Server instance that uses a PostgreSQL database, the restore fails and the following error appears in the atlassian-bitbucket.log:

    Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2198) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1927) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:365) ~[postgresql-9.3-1102.jdbc41.jar:na]
        at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:203) ~[bonecp-0.7.1.RELEASE.jar:0.7.1.RELEASE]
        at com.atlassian.bitbucket.internal.backup.liquibase.DefaultLiquibaseDao.insert(DefaultLiquibaseDao.java:272) ~[bitbucket-dao-impl-3.6.0-SNAPSHOT.jar:na]
        ... 39 common frames omitted

Cause

This problem occurs because PostgreSQL does not allow null characters (U+0000) in its text data types. As a result, when migrating or restoring a backup to a PostgreSQL database, the operation can fail with the error above. This problem is restricted to PostgreSQL.  Other databases supported by Bitbucket Server are not affected by null characters.

Resolution

Follow the steps below to sanitize the source database and then re-run the migration or restore.

  1. Stop Bitbucket Server.
  2. Find and remove the null characters (U+0000) in the source database text columns. Most likely candidates are comments (sta_comment table) or plugin settings (plugin_setting table).
    To remove the null characters on those tables, run the following SQL queries on the source database.
      1. If the source database is MySQL:

        SELECT * FROM sta_comment WHERE comment_text like concat('%', 0x00, '%');
        UPDATE sta_comment SET comment_text = replace(comment_text, 0x00, '') WHERE comment_text like concat('%', 0x00, '%');
        SELECT * FROM plugin_setting WHERE key_value like concat('%', 0x00, '%');
        UPDATE plugin_setting SET key_value = replace(key_value, 0x00, '') WHERE key_value like concat('%', 0x00, '%');
      2. If the source database is Oracle:

        SELECT * FROM sta_comment WHERE instr(comment_text, unistr('\0000')) > 0;
        UPDATE sta_comment SET comment_text = replace(comment_text, unistr('\0000')) WHERE instr(comment_text, unistr('\0000')) > 0;
        SELECT * FROM plugin_setting WHERE instr(key_value, unistr('\0000')) > 0;
        UPDATE plugin_setting SET key_value = replace(key_value, unistr('\0000')) WHERE instr(key_value, unistr('\0000')) > 0;
      3. If the source database is Microsoft SQL Server, execute the following T-SQL code (note that a custom function is used because the built-in REPLACE function cannot replace null characters):

        IF OBJECT_ID (N'dbo.removeNullCharacters', N'FN') IS NOT NULL
            DROP FUNCTION removeNullCharacters;
        GO
        CREATE FUNCTION dbo.removeNullCharacters(@s nvarchar(max))
        RETURNS nvarchar(max)
        AS
        BEGIN
                DECLARE @c nchar(1)
                DECLARE @p int
                DECLARE @ret nvarchar(max)
                IF @s is NULL
                        SET @ret = @s
                ELSE
                BEGIN
                        SET @p = 0
                        SET @ret = ''
                        WHILE (@p <= LEN(@s))
                        BEGIN
                                SET @c = SUBSTRING(@s, @p, 1)
                                IF @c <> nchar(0)
                                BEGIN
                                        SET @ret = @ret + @c
                                END
                                SET @p = @p + 1
                        END
                END
                RETURN @ret
        END;
        SELECT * FROM sta_comment WHERE cast(comment_text AS varchar) like '%' + char(0) +'%';
        UPDATE sta_comment SET comment_text = dbo.removeNullCharacters(comment_text) WHERE cast(comment_text AS varchar) like '%' + char(0) +'%';
        SELECT * FROM plugin_setting WHERE cast(key_value AS varchar) like '%' + char(0) +'%';
        UPDATE plugin_setting SET key_value = dbo.removeNullCharacters(key_value) WHERE cast(key_value AS varchar) like '%' + char(0) +'%';
      4. If the source database is HSQLDB, either:

        • Migrate the database to an intermediate external database (such as MySQL), or

        • Find the problematic rows using the following queries and manually edit them to remove the null characters (U+0000);

          SELECT * FROM sta_comment WHERE comment_text like U&'%\0000%';
          SELECT * FROM plugin_setting WHERE key_value like U&'%\0000%';

          Note: Before accessing Bitbucket Server's HSQLDB (internal database) with an external tool, ensure Bitbucket Server is not running.
          Note: Bitbucket Server's HSQLDB database (its internal database) can be opened by any database management tool that supports the JDBC protocol (such as DbVisualizer), using the following settings: 

          • Database driver: HSQLDB Server
          • Database driver location: BITBUCKET_INSTALL/atlassian-bitbucket/WEB-INF/lib/hsqldb-2.2.4.jar (where BITBUCKET_INSTALL is the path to the Bitbucket Server installation directory)

          • Database user: SA
          • JDBC URL: jdbc:hsqldb:file:BITBUCKET_HOME/shared/data/db;shutdown=true;hsqldb.tx=mvlocks (where BITBUCKET_HOME is the path to the Bitbucket home directory)

  3. Re-create the PostgreSQL database (using the settings highlighted here) used in the original migration if it is not empty (for safety reasons, Bitbucket Server blocks any migration to a non-empty database).
  4. Start Bitbucket Server.
  5. Initiate the migration or the restoration of the backup once more.
  6. If the migration or restoration still fails, use the following instructions to diagnose the cause:
    1. Turn on PostgreSQL statement logging.
    2. Recreate the target PostgreSQL database to ensure it is empty.
    3. Restart the migration or the backup restoration to trigger the error again.
    4. Consult the PostgreSQL statement log to determine which SQL INSERT failed. This will indicate which table still contains the null characters which have to be sanitized as described above.
    5. Restart from step (a) until the migration or restore succeeds.
Last modified on Apr 6, 2016

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.