Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Extended UTF-8 or accented characters could cause unexpected behavior in Bitbucket server. For example a branch with these characters can cause unexpected behavior and error similar to following one in the atlassian.bitbucket.log
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <repo path>/1052/refs/heads/大家好
at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:145)
at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:69)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:280)
at java.base/java.io.File.toPath(File.java:2290)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:437)
at com.atlassian.stash.internal.scm.git.RawGitAgent.execute(RawGitAgent.java:433)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveBranch(RawGitAgent.java:585)
at com.atlassian.stash.internal.scm.git.RawGitAgent.resolveHead(RawGitAgent.java:222)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:297)
at com.atlassian.stash.internal.scm.git.DefaultGitCommandFactory$2.call(DefaultGitCommandFactory.java:293)
at com.atlassian.stash.internal.repository.DefaultRefService.getDefaultBranch(DefaultRefService.java:191)
..
Diagnosis
Environment
Bitbucket is hosted on Windows and MacOS is unaffected.
- Impacts Bitbucket server 6.0+ installed on Linux servers, running Java 11 and LANG environment variable set to non-utf8 locale
Cause
Java 11 won't support setting sun.jnu.encoding to UTF-8 via the command line to use UTF-8 for encoding file paths. It will silently ignore it and will not have any effect.
Workaround
- If Bitbucket is running as service set LANG="en_US.UTF-8" in /etc/init.d/atlbitbucket and will be honoured.
- Set LANG="en_US.UTF-8" in the environment of the user with which Bitbucket is started