Fisheye search not working on files with UCS-2 encoding

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Fisheye search will not find text in files with UCS-2 encoding.

No errors are written in the logs.

Diagnosis

Environment

  • Replicated with Subversion (at least), version 1.9.5
  • Running Fisheye/Crucible 4.3.1

Diagnostic Steps

  • For example, let's use the HelloWorld.cs file attached to this article.
  • Download it and add it to Subversion version control:

    $ svn add HelloWorld.cs
    A  (bin)  HelloWorld.cs
  • Note that the output above mentions the files have been added as binary. To double-check that:

    $ svn propget svn:mime-type HelloWorld.cs
    application/octet-stream
  • Modify their mime-type to text/plain so that Fisheye can show their content:

    $ svn propset svn:mime-type text/plain HelloWorld.cs
    property 'svn:mime-type' set on 'HelloWorld.cs'
  • Commit the file:

    $ svn commit -m "Committing HelloWorld.cs with text/plain mime type"
    Sending        HelloWorld.cs
    Committing transaction...
    Committed revision 18.
    
  • Wait until it is shown in Fisheye:
  • Navigate to their source code, by opening http://localhost:8060/browse/SVN/trunk/HelloWorld.cs?r=18.
  • Search (COMMAND+F / CTRL+F) for the term WriteLine, and note that this term can be found in the source.
  • Now try to use Fisheye's search box at the top right corner in order to search for these same terms, and note that no results will be found:

  • The problem persists even if text/plain;UTF-16 encoding is set to the file.
  • Open any other UTF-8 file committed, choose a term to search for, then use Fisheye's search box for searching for that chosen term, and note that search will work as expected.

Cause

The problem lays on Lucene indexing: during this step files are always read using UTF-8; because of different encodings, each letter is indexed separately (see screen below)

This feature is being tracked at  FE-4477 - Getting issue details... STATUS .

Workaround

  • Change the file encoding to UTF-8 and commit it again. This will not fix the problem with already indexed files, but future revisions will be indexed properly.

Resolution

  • There is no immediate solution for this problem.
Last modified on May 23, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.