This server will be upgraded at 3pm Sydney time on December 3rd (December 2nd, 8pm PST) and will be down for up to 30 minutes.

UWC Hierarchy Builder Framework

Place to discuss the hierarchy builder framework

Table of Contents

Overview

The Universal Wiki Converter (UWC), as of version 42, the Page Hierarchy framework will work as described here.

Pages in wikis often have an explicit hierarchy. Pages can be subpages (or children pages) of other pages. Some wikis allows pages to have the same pagename if they are part of a different hierarchy (have different parent pages). While Confluence maintains a hierarchy of parent and children pages, it does not allow pages in the same space to have the same name, regardless of hierarchy. ( See Namespaces for Pagenames for more info on that feature. ) Since different wikis handle hierarchical info in (often subtly) different ways, handling a conversion of that data is philosophically complex. For a long time, the UWC did not provide a way to convert that information at all.

In Jan '07, Rolf Staflin kindly submitted code for a framework to handle these sorts of situations, and thus the Hierarchy Builder Framework made its way into the UWC.

Usage

There are currently three possible ways to handle hierarchy data conversion:

  • Use the HierarchyBuilder framework to automatically maintain parent and child relationships between pages.
  • Don't automatically maintain page relationships, but maintain the hierarchy within the page's name
  • Default - Don't maintain the data. Just preserve the pagename.
switch.hierarchy-builder value Default? Notes
UseBuilder This will turn on the Hierarchy Builder, that will automatically set parent-child relationships between pages
UsePagenames This will make the pagenames of each of the imported pages use the filepath of the page to maintain parent-child data
Default This will ignore parent-child relationships.

Lets talk about those in more detail:

Use the HierarchyBuilder

Add two properties to the converter properties file (conf/converter.mywiki.prioperties). The two additional properties should look like this:

MyWiki.0001.switch.hierarchy-builder=UseBuilder
MyWiki.0002.classname.hierarchy-builder=com.atlassian.uwc.hierarchies.FilepathHierarchy

The critical bits in the first line are that the key end with switch.hierarchy-builder, and that the value be UseBuilder.
The critical bits in the second line are that the key end with hierarchy-builder, not have the word switch in it, and that the value be a class that implements the Interface: com.atlassian.uwc.hierarchies.HierarchyBuilder

List of Hierarchies

List of existing hierarchies currently provided and supported by the UWC:
Example

For an example, check out: UWC Hierarchy Framework - UseBuilder Example

Use the Pagenames to maintain a hierarchy

Add one property to the converter properties file (conf/converter.mywiki.properties). It should look like this:

MyWiki.0001.switch.hierarchy-builder=UsePagenames

The critical bits are that the key end with switch.hierarchy-builder, and that the value be UsePagenames

This will use the filepath of the page being converted to create a pagename that reflects it's hierarchy.

Example

For an example, check out: UWC Hierarchy Framework - UsePagenames Example

Use the default, and ignore hierarchy data

Don't do anything. It's the default, it will happen on it's own.
Unless your converter is Dokuwiki, in which case you should comment out the line that looks like

DokuWiki.0001.switch.hierarchy-builder=UsePagenames

Which wiki converters use this framework?

Currently, only the Dokuwiki Converter module uses the hierarchy builder framework. By default it uses pagenames to maintain the data, but the converter property for switching this to use the builder is already in place. To change, simply:

  1. uncomment the builder lines:
    # DokuWiki.0.hierarchy-builder=com.atlassian.uwc.hierarchies.FilepathHierarchy
    # Dokuwiki.0001.switch.hierarchy-builder=UseBuilder
    
  2. and comment the pagenames line:
    DokuWiki.0001.switch.hierarchy-builder=UsePagenames
    

Why would I want to use this?

Two main reasons:

  1. You want to preserve the parent child relationships in your wiki pages automatically.
  2. Due to the hierarchical or directory like structure of your previous wiki, there are pages with the same names (differentiated by their directory structure) that will be lost on conversion to Confluence.
    For example, you have this sort of structure in your wiki
    • April
      • Meetings
        • Product Meeting
    • May
      • Meetings
        • Free Donuts!
        • Product Meeting

The second 'Meetings' page would overwrite the first 'Meetings' page. Same with the 'Product Meeting' pages. In this instance, you'd want to "UsePagenames" so that these non-unique pages could co-exist in the same Confluence space.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jun 27, 2007

    Erik Spears says:

    Hi Laura, Playing some with the UWC for a wiki that isn't already covered, and ...

    Hi Laura,

    Playing some with the UWC for a wiki that isn't already covered, and I have question.  From the "What about the file name extension?" section, your second bullet reads:

    What if you don't want the extension to be part of the page? It's totally fine to change the extension via a converter, so long as all extensions are the same, even if that means there is no extension.

    So we might need to be doing a lot of this actually.  I'm currently working off an export/dump from the old wiki and everything comes out in flat file.  So if I 'rebuild' some hierarchy, so to speak, that's a good thing.  However, the page naming is definitely an issue.  Having to rename lots means lots of broken links etc.  So, could you (or someone who's done this) elaborate on changing the extension viaa converter?  Are you referring to the regex stuff in the UWC, or is the a custom Java class, or...something else?  Any insights would be most appreciated.

    Thanks!

    1. Jun 28, 2007

      Laura Kolker says:

      Hi Erik, Those are really good questions. So, there's two issues here. I'll ad...

      Hi Erik,

      Those are really good questions.

      So, there's two issues here. I'll address them seperately.

      1. handling the page so that the hierarchy is maintained
      2. preserving links to pages

      maintaining the hierarchy:
      So, the issue is simply that because the FilepathHierarchy class uses the filepath, to determine parent child relationships, it needs to be able to create page objects that were represented as a directory, but might also have a corresponding file. In other words, the following use case:

      • You have two files with the following paths:
        • parent/page.txt
        • parent.txt
          In order to create the correctly named page parent.txt, from the path parent/page.txt, the converter needs to know what extension to use, and the most expedient solution was to require that all leaf nodes have the same extension. So, point being, when the converter gets to the part where it needs to build parent child relationships (right before it sends them to confluence), then the extensions all have to be the same.

      maintaining links
      Lets say that for whatever reason, your old wiki can export its pages, and the result of that export is pages with an extension: .txt
      However, the pages themselves have names with no extension. The UWC will always start with the assumption that the pagename is the same as the filename used to represent the page. You realize that if you let the pages be imported to Confluence with the .txt in the pagename, that the links to pages will all be wrong. You are appropriately horrified by this possibility.

      It turns out that it's extremely simple to deal with this problem. The easiest solution is to reuse an existing converter that already fixes this problem for you. Here's one you might find useful:

      com.atlassian.uwc.converters.ChopPageExtensionsConverter 
      

      If you add this converter to your converter.XYZwiki.properties file, like so:

      XYZwiki.1000-remove-extension.class=com.atlassian.uwc.converters.ChopPageExtensionsConverter
      

      all pages that are converted will have the pagename's extension removed.

      Alternatively, if this is not sufficient, you can create a converter class to make whatever changes you want to the pagename. You'll need the UWC source. You can get a read-only copy here:

      http://svn.atlassian.com/svn/public/contrib/confluence/universal-wiki-converter
      

      If you decided to do that, I would recommend looking at the ChopPageExtensionsConverter. It's a very simple class. It does only that one thing, and it's pretty straightforward. That being said, the basic steps are:

      1. create a class that implements BaseConverter
      2. overwrite the convert(Page) method
      3. get the existing page name with page.getName
      4. transform that name to what you'd prefer
      5. set the page name with page.setName(newName)

      Hope that helps.

      Cheers,
      Laura