Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Still need help?

The Atlassian Community is here for you.

Ask the community


   

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Environment

Java 17

Diagnosis

When attaching/uploading PDFs containing Unicode Prime symbols (e.g: ' `→) and saving the page, it shows a pink error rectangle with the following error message:

Error rendering macro 'view-file' Malformed input or input contains unmappable characters: <FILE-NAME>



Cause

Java is pulling the wrong encoding (ANSI_X3.4-1968) even with the LANG=en_US.UTF-8 setup:

  <java-runtime-environment>
    <confluence.child-macro.max-depth>4</confluence.child-macro.max-depth>
    <java.specification.version>17</java.specification.version>
    <sun.jnu.encoding>ANSI_X3.4-1968</sun.jnu.encoding>
...
    <file.encoding>ANSI_X3.4-1968</file.encoding>
...
    <native.encoding>ANSI_X3.4-1968</native.encoding>


Background

(lightbulb)Similar to Bitbucket KB: Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error

(info) To make the solution persistent, we apply the setup differently in Confluence to Java 17 by using the LC_ALL= variable. If we use the LANG= setup, Java will rollback the change soon after exporting the variable:

When we try to use the LANG=en_US.UTF-8 variable, Java seems to ignore the configuration so it doesn't work:

@HKGGFCQWPG java % export LANG=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: US-ASCII
Default Charset by InputStreamReader: ASCII
Default Charset: US-ASCII


 On the other hand when using the LC_ALL=en_US.UTF-8 variable, we see Java persistently using the UTF-8 as required.

@HKGGFCQWPG java % export  LC_ALL=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: UTF-8
Default Charset by InputStreamReader: UTF8
Default Charset: UTF-8

Solution

Setting up the LC_ALL=en_US.UTF-8 variable in setenv.sh file or in the user profile that runs Confluence. Restart the application.


Last modified on Sep 18, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.