Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Extended UTF-8 or accented characters could cause unexpected behaviour in the Bitbucket Data Center. For example, a branch with these characters can cause unexpected behaviour and errors similar to the following one in the <Bitbucket-home>/mesh/log/atlassian-mesh.log .

java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <repo path>/1052/refs/heads/大家好
        at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145)
        at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:280)
        at java.base/java.io.File.toPath(File.java:2290)
        at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:437)
        at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:433)
        at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveBranch(RawGitAgent.java:585)
        at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveHead(RawGitAgent.java:222)
        at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:297)
        at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:293)
        at com.atlassian.stash.internal.repository.DefaultRefService.getDefaultBranch(DefaultRefService.java:191)
...

Diagnosis

Environment

  • Bitbucket is hosted on Windows and MacOS is unaffected.

  • Impacts Bitbucket Server / Data Center 6.0+ installed on Linux servers:
    • Bitbucket application is running on Java 11 and above.
    • LANG  environment variable set to a non-utf8 locale.
      OR
      LC_CTYPE  environment variable set to a non-utf8 locale.

Cause

Java 11 won't support setting sun.jnu.encoding to UTF-8 via the JVM argument to use UTF-8 for encoding file paths. It will silently ignore it and have no effect.

Solution

  1. Update LANG to utf8:
    1. If Bitbucket is running as service set LANG="en_US.UTF-8" in /etc/init.d/atlbitbucket and will be honoured.
    2. Set LANG="en_US.UTF-8" in the environment of the user with which Bitbucket is started.
  2. If this does not work, please check what's the value for LC_CTYPE environment variable - it should be en_US.UTF-8 as well. 

    $ env | grep LC_CTYPE # If you did not set this configuration explicitly, then this command will return nothing.
    LC_CTYPE=en_US.UTF-8
    # locale # Use this command to check if all the locale settings are set to UTF-8
    LANG=en_US.UTF-8
    LANGUAGE=
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
Descriptionextended UTF-8 characters cause "Malformed input or input contains unmappable characters"
ProductBitbucket Server

Last modified on Jul 3, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.