Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Extended UTF-8 or accented characters could cause unexpected behaviour in the Bitbucket Data Center. For example, a branch with these characters can cause unexpected behaviour and errors similar to the following one in the <Bitbucket-home>/mesh/log/atlassian-mesh.log
.
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <repo path>/1052/refs/heads/大家好
at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145)
at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:280)
at java.base/java.io.File.toPath(File.java:2290)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:437)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:433)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveBranch(RawGitAgent.java:585)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveHead(RawGitAgent.java:222)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:297)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:293)
at com.atlassian.stash.internal.repository.DefaultRefService.getDefaultBranch(DefaultRefService.java:191)
...
Diagnosis
Environment
Bitbucket is hosted on Windows and MacOS is unaffected.
- Impacts Bitbucket Server / Data Center 6.0+ installed on Linux servers:
- Bitbucket application is running on Java 11 and above.
LANG
environment variable set to a non-utf8 locale.
ORLC_CTYPE
environment variable set to a non-utf8 locale.
Cause
Java 11 won't support setting sun.jnu.encoding to UTF-8 via the JVM argument to use UTF-8 for encoding file paths. It will silently ignore it and have no effect.
Solution
- Update LANG to utf8:
- If Bitbucket is running as service set
LANG="en_US.UTF-8"
in/etc/init.d/atlbitbucket
and will be honoured. - Set LANG="en_US.UTF-8" in the environment of the user with which Bitbucket is started.
- If Bitbucket is running as service set
If this does not work, please check what's the value for
LC_CTYPE
environment variable - it should be en_US.UTF-8 as well.$ env | grep LC_CTYPE # If you did not set this configuration explicitly, then this command will return nothing. LC_CTYPE=en_US.UTF-8 # locale # Use this command to check if all the locale settings are set to UTF-8 LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=