Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors
Environment
Java 17
Diagnosis
When attaching/uploading PDFs containing Unicode Prime symbols (e.g: ' `→) and saving the page, it shows a pink error rectangle with the following error message:
Error rendering macro 'view-file' Malformed input or input contains unmappable characters: <FILE-NAME>
Cause
Java is pulling the wrong encoding (ANSI_X3.4-1968) even with the LANG=en_US.UTF-8 setup:
<java-runtime-environment>
<confluence.child-macro.max-depth>4</confluence.child-macro.max-depth>
<java.specification.version>17</java.specification.version>
<sun.jnu.encoding>ANSI_X3.4-1968</sun.jnu.encoding>
...
<file.encoding>ANSI_X3.4-1968</file.encoding>
...
<native.encoding>ANSI_X3.4-1968</native.encoding>
Background
Similar to Bitbucket KB: Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error
To make the solution persistent, we apply the setup differently in Confluence to Java 17 by using the LC_ALL= variable. If we use the LANG= setup, Java will rollback the change soon after exporting the variable:
When we try to use the LANG=en_US.UTF-8 variable, Java seems to ignore the configuration so it doesn't work:
@HKGGFCQWPG java % export LANG=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: US-ASCII
Default Charset by InputStreamReader: ASCII
Default Charset: US-ASCII
On the other hand when using the LC_ALL=en_US.UTF-8 variable, we see Java persistently using the UTF-8 as required.
@HKGGFCQWPG java % export LC_ALL=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: UTF-8
Default Charset by InputStreamReader: UTF8
Default Charset: UTF-8
Solution
Setting up the LC_ALL=en_US.UTF-8 variable in setenv.sh file or in the user profile that runs Confluence. Restart the application.