Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Environment

Java 17

Diagnosis

When attaching/uploading PDFs containing Unicode Prime symbols (e.g: ' `→) and saving the page, it shows a pink error rectangle with the following error message:

Error rendering macro 'view-file' Malformed input or input contains unmappable characters: <FILE-NAME>

Cause

Java is pulling the wrong encoding (ANSI_X3.4-1968) even with the LANG=en_US.UTF-8 setup:

  <java-runtime-environment>
    <confluence.child-macro.max-depth>4</confluence.child-macro.max-depth>
    <java.specification.version>17</java.specification.version>
    <sun.jnu.encoding>ANSI_X3.4-1968</sun.jnu.encoding>
...
    <file.encoding>ANSI_X3.4-1968</file.encoding>
...
    <native.encoding>ANSI_X3.4-1968</native.encoding>

Background

To make the solution persistent, we apply the setup differently in Confluence to Java 17 by using the LC_ALL= variable. If we use the LANG= setup, Java will rollback the change soon after exporting the variable:

When we try to use the LANG=en_US.UTF-8 variable, Java seems to ignore the configuration so it doesn't work:

@HKGGFCQWPG java % export LANG=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: US-ASCII
Default Charset by InputStreamReader: ASCII
Default Charset: US-ASCII

On the other hand when using the LC_ALL=en_US.UTF-8 variable, we see Java persistently using the UTF-8 as required.

@HKGGFCQWPG java % export  LC_ALL=en_US.UTF-8
@HKGGFCQWPG java % java getcharset.java
Default Charset: UTF-8
Default Charset by InputStreamReader: UTF8
Default Charset: UTF-8

Solution

Setting up the LC_ALL=en_US.UTF-8 variable in setenv.sh file or in the user profile that runs Confluence. Restart the application.

Confluence Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Still need help?

Summary

Environment

Diagnosis

Cause

Background

Solution

Page

Viewport

Confluence

Uploading PDFs containing different Unicode Prime symbols results in "Malformed input or input contains unmappable characters" errors

Related content

Still need help?

Summary

Environment

Diagnosis

Cause

Background

Solution

Related content