UWC Developer Documentation
| Get the Source The UWC is in Subversion. |
Download the Latest version of the distribution*
- How to check out and build
- What are the developer features I can take advantage of?
- How to convert another wiki.
- How the UWC finds and sends attachments
- Check in your changes
- Architecture
- Wiki Exporters
- Tips
- Anatomy of a Conversion properties file
- *one suggested directory and file structure for UWC consumption
- Other Notes
- *Universal Wiki Converter - Original Docs and Discussion - useful if you want to get more of a background and understand how some of the ideas originated.
- Getting Help
How to check out and build
svn co https://svn.atlassian.com/svn/public/contrib/confluence/universal-wiki-converter
http*s* must be used if you have subversion write access to check things back in. Otherwise you can do an
svn co http://svn.atlassian.com/svn/public/contrib/confluence/universal-wiki-converter
building the project
ant -p will give you all the targets ant will create a build that can run from here: cd universal-wiki-converter/devel/target/uwc . ./run_uwc_devel.sh this is handy for a quicker devel cycle than using the package target ant package will create the full uwc.jar and package up the entire distribution under universal-wiki-converter/dist
What are the developer features I can take advantage of?
- A wizard like UI already exists which assists users in walking through the steps necessary to convert their wiki
- The UI and framework take care of locating pages to convert and then sending the converted pages and attachements to Confluence
- There are already a few examples of exactly what need to be written to successfully convert another wiki to Confluence.
- The UI allows users to easily swap out certain converters that could be causing unexpected results. This has proven useful.
- Everything you should need to create a set of wiki converters is already in the stack in terms of libraries.
- The ANT build system is up and running
- The system uses regular expressions. In many cases someone has already written the start of a set of regular expressions to convert their own wiki. The UWC was written with the intention of making it easy to leverage these pre-existing efforts.
- Several UWC specific developer frameworks
How to convert another wiki.
Have a look at this video: UWC Developer Video - 11 min.
- I'd recommend locating a wiki syntax page for your origin wiki and bookmarking since you'll be visiting it frequently.
- I'd recommend creating a test page which shows all the most popular syntaxes for the origin wiki that you'll be converting. This is a file you can the run directly through the UWC to test.
- If you can get your hands on some (or all) of the source files you'll be converting that's very handy to have around.
- Look for other existing converters from which you can borrow regular expressions. There might be some here under Confluence content converters or there might not.
- Under uwc/devel/conf you'll see a few files such as converter.pmwiki.properties, copy one of these to something like converter.mediawiki.properties
- Then you can either start adding PERL regular expressions directly to the 'converter.mediawiki.properties' or you can extend BaseConverter class to run things through your own engine (I really need to add something which will allow adding of Java's built in regex package expressions directly to the property files since that engine is proving more robust and powerful to use than the PERL stuff).
- There is an ANT build file which will build the project for you under the devel/target/uwc directory.
It's a bit sketchy but that's the gist
Feel free to email me any questions - brendan@atlassian.com
Version control and checking in your changes
If you make improvements please contribute them back. You'll need to check out the project via the http*s* link above as opposed to the http link. You'll also need to acquire a Subversion login/password from Jonathan Nolen - jnolen@atlassian.com.
The source code is Apache version 2
Architecture
How it works:
The Universal Wiki Converter is a client side application with a rich GUI. It converts files containing wiki markup from the first wiki and then sends those files directly to Confluence via XMLRPC
What the framework provides:
- A GUI interface
- GUI allows user to select pages to convert
- GUI allows user to dynamically specify Confluence settings
- GUI provides a regular expression test tool for rapid development of regular expressions which these specific regex engines (all regular expression engine implementations seem to be slightly different)
- GUI provides a %completetion bar while files are both being converted and sent to Confluence
- A feedback window with which to send feedback to the user
- Currently one regular expression engine. The regular expression engines are pluggable. You need only extend 'BaseConverter' or have a look at the PERLConverter to see how (~5 lines of code).
- The BaseConverter can be extended to implement 'Java converter' classes to handle the trickier cases where a regular expression isn't quit up to the task.
- All the XMLRPC code necessary to send pages to Confluence, and upload attachments to their pages
- The ability to dynamically massage page names.
- Several of the systems have unit tests both to verify functionality and provide sample code
- lots of sample regular expressions.
Wiki Exporters
The UWC reads in individual files on the hard drive. The names of those files become the wiki page names. The contents of the files are expected to be the content of the pages and markup that gets converterted by the UWC.
In many cases wikis either store their contents in this format already or have built in features to export their content to this format of files.
However some wikis do not have facilities built in. Their data must be retrieved directly from a database or an XML file or converted into this format. In such cases the developer can define an 'exporter' for a wiki or multiple exporters.
There are two main components to an 'exporter'
1) A Java class which implements the com.atlassian.uwc.exporters.Exporter interface. This drives the behavior of the exporter.
2) An exporter properties file which is named exporter.some-existing-converter-name.properties located in the conf/ dirrectory
The UWC detects all such property files and:
- v45 and earlier: lists them on the UWC's 'exporter' tab.
- v46 and later: will enable the export button when the associated wiki is chosen in the drop-down menu.
The exporter properties file's contents will be passed into the class implementing the Exporter interface as a Map. The developer will thereby have access to all those settings in the properties file.
The only required property in the exporter.name.properties file is:
- Exporter fully qualified class name
exporter.class=com.atlassian.uwc.exporters.SomeWikiExporter
One example of the Exporter class is the MediaWikiExporter. This is used to query the database via jdbc and retrieve the MediaWiki's contents. Those contents are then written out to individual files corresponding to wiki pages which is the expected format for the UWC.
Another example might be to convert an XML export file into individual files.
Tips
- To test new regular expression changes to the converter.wiki.properties file you do NOT need to restart the UWC. Simply reselect the target wiki with the "choose wiki" button and any changes to your regex file will be picked up. If this does not seem to be working it is probably because you are changing the converter.tikiwiki.properties (or whichever) file in a different location than where the UWC is picking it up.
- It seems that the UWC running Java 6 might be a little faster than jdk1.5.
- If you're developing with IDEA or Eclipse and running through the debugger in most cases code changes can be recompiled and reloaded by the IDE without restarting the UWC.
- *Regression Testing*
- It is a good idea to create a file which demonstrates all of the syntax you are trying to convert. New changes can unexpectedly affect things that use to be working. This is generally not too time consuming, but regression testing is key.
- Having a file such as SampleTikiwiki-Input2.txt and then an output file which correctly converted text such as SampleTikiwiki-Expected2.txt is helpful both to you and other developers. Please check such files in under sampleData/<wikiName>
- withing the SampleTikiwiki-Input2.txt it can be helpful to wrap certain text describing the syntax changes or showing the target Confluence syntax in whatever that wiki's equivalent of
{code}
tags are. That way the text you don't really want the converter messing with comes through 'unmolested'. Otherwise sample text will usually get changed so the expected output file is not quite as clear as you'd like. This of course works best if you've written the converter for the tag which translates to
{code}
in Confluence.
- I've recently started writing all the regular expressions using Java's built in regex. This regex engine has proven extremely powerful, flexible and reliable. Additionally there is a nice demo/testing util checked in which makes developing the regular expressions much easier. I can't remember exactly where I found it online, but I've checked it in under the uwc/devel/tools dir and you can run it as demonstrated below.
- cd C:\projects\universal-wiki-converter-public\devel\tools\javaregex\classes (or whatever location you've checked things out to)
- java regexdemo.AppRegexDemo
- As you work through issues it's a good idea to track them all in a single file. Then you can use that file for regression testing.
- Regression testing - keep handy a 'test' file with all the syntax you're testing as well as a successful conversion of that file. Much like unit testing you'll feel much more confident refactoring as you can always run your test file through again and then do a diff against the 'successful' output file. I recommend some sort of visual diff tool. I use JEdit's jdiff plugin.
- There are at least three 'kinds' of conversion regular expressions I find myself writing.
1) conversions of things which don't want to be touched by other regular expressions. These include links, code blocks which and attachments among others
2) escapes - when the original wiki uses characters that aren't meaningful in that wiki but ARE meaningful in Confluence you have to escape those characters or Confluence will take them as formatting and generally look strange
3) other conversions which just kind of stack up against each other...bold, italics, tables
What's working well is to order the above conversion types as shown - 1) 2) 3). For the first type you generally want to tokenize the matches by using the TokenMap class or the built in tokenizing replacement. This way you convert something but then it gets tokenized so as it won't be touched by any other conversion until it is 'de-tokenized' at the end. - Regular Expression Reference Links:
- http://regexlib.com/cheatsheet.aspx
- http://www.ilovejackdaniels.com/regular_expressions_cheat_sheet.pdf
- http://www.developer.com/lang/article.php/3330231
- http://juerd.nl/site.plp/perlcheat
- ASCII reference http://www.lookuptables.com/
- http://aspn.activestate.com/ASPN/docs/Komodo/3.5/komodo-doc-regex-intro.html
- When you're testing your converter, you do not need to import to Confluence. After you click Convert Pages to Confluence Syntax, a popup will ask you if you want to send the pages to Confluence. You can click No, and instead examine your converted pages in the output/output directory.
- It is very helpful to develop a 'test file' which distills all of the syntax you are trying to convert along with its correct conversion. The problem is after you convert this file it's not always easy to know what you're looking for because if you show the correctly converted text in the file it will probably get changed into something else.
So what is very helpful is to essentially say, "hey converter don't touch this". What makes sense is to look for how the origin wiki puts the equivalent of Confluence
{code}
tags around a block of text (so it won't be messed with), create that converter and then do the same.
So for PmWiki you have this syntax:
[@
some code you don't want parsed here
@]
The equivalent converter for this is:
PmWiki.0040_code-block.java-regex-tokenize=[@(.*?)@]
{replace-multiline-with}
\\ \\ $1
Newline Tip
To match and replace ABC with a newline character:
SomeWiki.newline-replace-example.java-regex=ABC{replace-with}NEWLINE
The text NEWLINE (in upper case) now resolves to a system dependent newline character.
Anatomy of a Conversion properties file
Here I'm going to describe naming conventions, particularly ones with meaning, and some special classes that can be used to help you with your conversions.
Property names
Your property name will look like:
Wikitype.xxxx-syntax_description.suffix
Let's go through that in order.
- The Wikitype section of the property is arbitrary, but for consistency, name it after the wiki you are converting from
- The xxxx section is a number. This is useful for helping to keep the converters in an understandable order. Essentially, the converters are run in ASCII Ascending alphabetical order. So, provided your Wikitype is the same for all converters, these numbers are going to determine the order the converters get run in
- The syntax_description is just for ease of identifying what the converter does
- The suffix will tell the ConverterEngine what type of property this is. Choices are:
- class - Use this one if the converter will use a Java class that implemented BaseConverter. See Classes.
- java-regex - Use this one if the converter will do a simple search and replace java regex expression here. See regular expressions for more info.
- perl - Use this one if the converter will use perlish search and replace syntax. See regular expressions for more info.
- java-regex-tokenizer - Use this one if the converter will do a search and replace, and then tokenize the results so that they are no longer available for conversions. See Tokenizing classes for more info.
- A non-converter property

Converters are run in ASCII alphabetical order by property name MyWiki.0100-stuff will get run before
MyWiki.0200-stuff which will get run before
MyWiki.0200-xyz
Property values
Property values for syntax converters are either classes or regular expressions.
Property values for non-converter properties are tailored to the property in question (booleans, settings, classnames, etc.)
Classes
If it's a class, the property value should point to a Java class that implements BaseConverter.
MyWiki.0100-converting_stuff.class=com.atlassian.uwc.converters.ConvertingStuff
This class implements com.atlassian.uwc.converters.BaseConverter. The entry method is convert.
Basically, you should:
- get the original text
- Do something to it, maybe with a regular expression maybe not
- set the page's converted text
public void convert(Page page) { String input = page.getOriginalText(); String converted = doSomething(input); page.setConvertedText(converted) }
regular expressions
If it's a regular expression, provide a search and replace string. If you are using the java-regex converter type, use the delimiter {replace-with} between your search and replace strings.
The following java-regex example takes characters surrounded by <nowiki> tags and replaces those tags with Confluence noformat macros.
Mediawiki.0200-re_noformat.java-regex=<nowiki>((?s).*?)</nowiki>{replace-with}{noformat}$1{noformat}
Here's a perl example. It looks like a perl regex. This one converts italics.
DokuWiki.1underlined.perl=s/__([^_]+)__/+$1+/g
Tokenizing classes
Let's say you want to convert something, but then not allow any further conversions. For example, converting the contents of <code> tags to {code} tags. You would then use the java-regex-tokenizer type. This would perform the search and replace and then tokenize those converted sections so that they were protected from further conversion.
Mediawiki.0095-re_code.java-regex-tokenizer=\<code\>((?s).*?)\<\/code\>{replace-with}{code}$1{code}
Tokenizer properties have a convenience option for when the developer wants "dotall" and "multiline" modes in effect. Use {replace-multiline-with} instead of {replace-with}:
Mediawiki.0095-re_code.java-regex-tokenizer=\<code\>(.*?)\<\/code\>{replace-multiline-with}{code}$1{code}
To detokenize, you would add the following class to the end of your converter.properties:
Mediawiki.2000-detokenize.class=com.atlassian.uwc.converters.DetokenizerConverter
Nonconverter properties
Non-converter properties are used to handle settings that would affect the conversion, but are not technically converters. They are often used to turn on or customize optional features.
| Non-converter properties belong on top We recommend that non-converter properties be set at the beginning of the properties file. |
hierarchy
UWC Hierarchy Builder Framework
Description - The hierarchy framework provides functionality to allow the UWC to set parent-child relationships between pages.
Example
MyWiki.0001.switch.hierarchy-builder=UseBuilder
MyWiki.0002.classname.hierarchy-builder=com.atlassian.uwc.hierarchies.FilepathHierarchy
page histories
UWC Page History Framework
Description - The page histories framework provides the ability to maintain version histories for pages.
Example
MyWiki.0001.switch.page-history-preservation=true MyWiki.0002.suffix.page-history-preservation=-#.txt
disabling illegal pagenames framework
UWC Illegal Pagenames Framework - Disabling
Description - The disabling illegal pagenames framework feature provides a way to turn off the default illegal pagenames handling.
| Careful! Allowing illegal pagenames to be uploaded to your Confluence could produce unknown behavior. |
Example
Mywiki.0001.illegal-handling=false
auto detect spacekeys
UWC Auto Detect Spacekeys Framework
Description - The Auto Detect Spacekeys framework will detect and create spaces on the fly for your new Confluence pages.
Example
Mywiki.0001.autodetect-spacekeys=true
Filename extension stripping class
If you do not want the pages that Confluence imports to have the filename extension in the page title, add this class to the end of your converter.properties:
Mediawiki.1000-remove-extension.class=com.atlassian.uwc.converters.ChopPageExtensionsConverter
Conversion examples
Important Classes
TokenMap -
This is a helper class to create, store and retrieve tokens.
- <p/>
- Certain elements such as links and code can be quite tricky
- to convert. One issue is that you need to escape text in some places
- but not others (like inside links).
- <p/>
- Use this class for anything where you want to avoid syntaxt from
- being escaped. VERY HELPFUL.
Devel cycle notes:
- to devel just run 'ant' or 'ant all' which is the same. this does not create a dist, but does build all the classes are create everything under 'target/uwc' .
- during the devel cycle you can run 'target/uwc/run_uwc_devel.sh'. this lets you run the UWC without packaging up the whole distribution
Other Notes
After learning the value of tokenizing I'm wondering how JavaCC might be leveraged to make development of new wiki conversions faster (actual runtime speed is of little relative importance as long as we're not doing something silly).
Don't use Base64 encoding or decoding. It is too slow to be practical.
Getting Help
Want to ask a developer a question? Try out the UWC Forum.

Comments (1)
Feb 27, 2007
Eric Sorenson says:
Here's a link I found and posted on the old Doc page; I found myself needing it ...Here's a link I found and posted on the old Doc page; I found myself needing it again so maybe it'll help somebody else.
If you want to add your own regex converters but aren't sure what exactly is supported by the 'perl' engine, the underlying code is from the Jakarta ORO project and the regex flavor is documented here: http://jakarta.apache.org/oro/api/org/apache/oro/text/perl/Perl5Util.html
-Eric