Mediawiki Importer

Mediawiki to Confluence Translator

This project has been moved to:
http://intient.com/wiki/index.php/Mediawiki_to_Confluence_Translator

Full source code is now available at the above link.

See also the Universal Wiki Converter, which now has a Mediawiki translator.

Labels

contentconverter contentconverter Delete
wikiconverter wikiconverter Delete
wikiimporter wikiimporter Delete
plugin plugin Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Mar 27, 2006

    Guy Fraser says:

    The GPL is not compatible in any way, shape or form with Confluence (or any othe...

    The GPL is not compatible in any way, shape or form with Confluence (or any other commercial environment for that matter). Could you release under BSD or ASL?

    1. Mar 27, 2006

      Bradley Beddoes says:

      Given this tool lives outside Confluence it won't really matter. ASL won't be a ...

      Given this tool lives outside Confluence it won't really matter. ASL won't be a problem though, will modify above text.

  2. Mar 27, 2006

    Andrew Miller says:

    Thanks VERY much for this tool. I've verified that my mysql username/password/db...

    Thanks VERY much for this tool. I've verified that my mysql username/password/db info is correct but am getting this error (I created a separate read-only user).

    [root@theta mw2cf-translator-0.0.1]# ant exec
    Buildfile: build.xml
    
    exec:
         [java] Executing mediawiki to confluence conversion utility
         [java] com.mysql.jdbc.Driver
         [java] Connect failed bailing..
    
    BUILD SUCCESSFUL
    Total time: 1 second
    

    Any thoughts?

    1. Mar 27, 2006

      Bradley Beddoes says:

      A thousand apologies for that, simply to tired last night to see a mistake in th...

      A thousand apologies for that, simply to tired last night to see a mistake in the jar Manifest (using absolute rather then relative paths on the Class-Path directive).

      That brings us to version 0.0.2, should actually fire up this time. Essentially your class loader couldn't find the mysql package.

      Let me know how that goes.

      Also guys I would be interested in your performance numbers when you run the tool.

  3. Mar 31, 2006

    Bradley Beddoes says:

    I will be releasing a new version of this tool this weekend. The new version inc...

    I will be releasing a new version of this tool this weekend.
    The new version includes

    • Parsing of all media content from mediawiki (actual files) to local directories ready for attachement to confluence pages
    • Various Bugfixes
    • Full source disclosure under ASL
    1. Mar 31, 2006

      Christain Stovall says:

      \o/ Looking forward to testing it!

      \o/ Looking forward to testing it!

  4. Apr 01, 2006

    Carlton Brown says:

    SWEET.

    SWEET.

  5. Apr 03, 2006

    Christian Vollrath says:

    Nice tool! I have some problems with German umlauts. The file names are written ...

    Nice tool!
    I have some problems with German umlauts. The file names are written in UTF-8 but the text files are not UTF-8 encoded . I can convert the filenames from UTF-8 to Latin-1 using recode. But I can not convert the content with the same tool. Are you using different encodings?

    1. Apr 03, 2006

      Bradley Beddoes says:

      Hi, Will have a look into this tonight, I feel its probably just an oversite on ...

      Hi,
      Will have a look into this tonight, I feel its probably just an oversite on my behalf and won't be difficult to correct.

  6. Apr 03, 2006

    Carlton Brown says:

    Couple things I noticed: 1:  had to change the value of the distversion pr...

    Couple things I noticed:

    1:  had to change the value of the distversion property in build.xml to 0.0.2

    2:   if the output directory specified in confluence.properties does not exist, the directory is not created, no output is saved, and no error is thrown.  Either create the directory or throw an error that it doesn't exist.   

    1. Apr 03, 2006

      Bradley Beddoes says:

      Hi Yes the distversion property was incorrect the latest builds correct that sli...

      Hi
      Yes the distversion property was incorrect the latest builds correct that slight oversite.

      As for number 2 I noted this in the usage instructions "Be sure that the output directory configured for confluence.properties is writeable and it exists, future revision will enforce this but that check is missing now."

  7. Apr 03, 2006

    Bradley Beddoes says:

    All I apologise for the lack of stated release over the weekend, had two quite s...

    All I apologise for the lack of stated release over the weekend, had two quite serious issues pop-up in my life outside programming which couldn't wait and had to be given priority.

    Now that corrected I should be able to put the latest build up tonight on the intient svn server after I finish work today (hopefully with the above UTF-8 correction).

  8. Apr 04, 2006

    Bradley Beddoes says:

    I have put 0.0.3 up in the attachments section for testing for you all until I g...

    I have put 0.0.3 up in the attachments section for testing for you all until I get the svn server sorted out later this evening. [|^mw2cf-translator-0.0.3.tar.gz]

    Corrections:

    • New config option to specify encoding of output file - req by Christian Vollrath
    • Distversion corrected in build.xml - Carlton Brown
    • Parsing of all media content from mediawiki (actual files) to local directories ready for attachement to confluence pages - this is something which I think is pretty cool
    • Various minor bug fixes.

    I have been onto atlassian today about a licence to continue development, see what they come back to me with, my hope is if they grant it I will be able to integrate some of the webservices API to make the media and page import fully dynamic from end to end.

    1. Apr 04, 2006

      Christian Vollrath says:

      I have tested the new version. The output format of the filenames and text is UT...

      I have tested the new version. The output format of the filenames and text is UTF-8 now. But I am not able to change the encoding with encodingFormat=latin1. When setting processMediaFiles=true the images attached to the MediaWiki pages are not exported to the directory set in mediawikiMediaDirectory. Did I something wrong?

      Here is my configuration file confluence.properties:

      ####
      # Author: Bradley Beddoes
      # Created: Mar 25, 2006
      # Confluence configuration properties file
      ####
      
      # Directory to write output files to
      outputDirectory=./wiki
      
      # Do you want to process the mediawiki attachments/images directory - must be local disk copy
      processMediaFiles=true
      
      # Directory storing mediawiki attachments if processing above is true
      mediawikiMediaDirectory=./attachments
      
      # Directory to place media listings per page into
      mediaOutputDirectory=./media
      
      # Java compatible encoding format for output files
      encodingFormat=latin1
      
      # Extension to append to output files
      outputExtension=.cf
      

      And here the ouput of ant:

      [root@urwebs21 mw2cf-translator-0.0.3]# ant exec
      Buildfile: build.xml
      
      exec:
           [java] ConfluenceLogic() - Determined values from confluence.properties:
      
           [java] outputDirectory: ./wiki
           [java] mediawikiMediaDirectory: ./attachments
           [java] mediaOutputDirectory: ./media
           [java] outputExtension: .cf
           [java] encodingFormat: latin1
           [java] media processing enabled
           [java] MediawikiLogic_15.connect(): Database connection established
           [java] MediawikiLogic_15.disconnect(): Database connection closed
           [java] Pages acquired: 66
           [java] Processing pages...
           [java] *** Processed 66 pages ( 87722 characters ) in: 2031 ms ***
           [java] These files may now be imported to confluence using the disk import utility in the space admin section
      
      BUILD SUCCESSFUL
      Total time: 3 seconds
      
      1. Apr 04, 2006

        Bradley Beddoes says:

        Hi Possible values for encodingFormat are (and this is in the new doco I am writ...

        Hi
        Possible values for encodingFormat are (and this is in the new doco I am writing btw, dont include the " characters):

        "US-ASCII" 	Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
        "ISO-8859-1"   	ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
        "UTF-8" 	Eight-bit UCS Transformation Format
        "UTF-16BE" 	Sixteen-bit UCS Transformation Format, big-endian byte order
        "UTF-16LE" 	Sixteen-bit UCS Transformation Format, little-endian byte order
        "UTF-16" 	Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
        

        Let me check up on the attachment process and get back to you.

      2. Apr 04, 2006

        Bradley Beddoes says:

        I have just re-tested the media migration paths with basically the same config a...

        I have just re-tested the media migration paths with basically the same config as you gave above and all worked as expected...

        Can you give me any information on the platform your running on?

        My original mediawiki media directory is formatted as such:

        beddoes@velociraptor:~/workspace/wiki-importer/mw2cf-translator-0.0.3$ ls -l attachments/
        total 17
        drwxr-x---   8 beddoes users   192 2005-06-14 10:44 0
        drwxr-x---  11 beddoes users   264 2006-03-14 17:13 1
        drwxr-x---   8 beddoes users   192 2005-05-11 09:50 2
        drwxr-x---  10 beddoes users   240 2006-01-17 10:12 3
        drwxr-x---   7 beddoes users   168 2005-08-09 12:25 4
        drwxr-x---   9 beddoes users   216 2006-01-25 11:21 5
        drwxr-x---   9 beddoes users   216 2005-07-20 12:10 6
        drwxr-x---  12 beddoes users   288 2006-01-17 13:21 7
        drwxr-x---   7 beddoes users   168 2005-12-05 12:28 8
        drwxr-x---  11 beddoes users   264 2005-09-20 08:25 9
        drwxr-x---  13 beddoes users   312 2005-09-23 10:24 a
        drwxr-x---  18 beddoes users   432 2005-02-03 15:10 archive
        -rw-r-x---   1 beddoes users 10191 2005-01-18 12:00 av-78.gif
        drwxr-x---  10 beddoes users   240 2005-12-05 14:48 b
        drwxr-x---   7 beddoes users   168 2005-04-26 11:50 c
        drwxr-x---  10 beddoes users   240 2006-02-02 09:38 d
        drwxr-x---  12 beddoes users   288 2006-01-17 10:13 e
        drwxr-x---  13 beddoes users   312 2005-08-11 15:05 f
        drwxr-x---   2 beddoes users   552 2005-03-10 16:09 math
        -rw-r-x---   1 beddoes users   202 2004-05-08 12:55 README
        drwxr-x---  18 beddoes users   432 2005-09-20 08:23 temp
        drwxr-x---  18 beddoes users   432 2005-09-07 13:14 thumb
        drwxr-x---   2 beddoes users    48 2005-03-10 16:09 tmp
        

        The directories are recursively searched for the media that is listed in [[Media:]] and [[Image:]] type tages in mediawiki and it does not look at the archive, temp, thumb type directories.

        Also it should create a .cf file in the media directory for each known page with media which just lists the raw file names (I figured someone might want this data in a script).

        Possibly architecture related? I haven't tested on Windows or OSX yet......

        1. Apr 04, 2006

          Christian Vollrath says:

          I made a mistake since I thought that your program will get the attachments out ...

          I made a mistake since I thought that your program will get the attachments out of MediaWiki by accessing the database. Now I have created a copy of the directory images to the mw2cf directory and it works.

          Yet some other things:

          • the alternate text of images is not converted correct to Confluence syntax; i.e. [[Image:Pricing_Set-anlegen-1.jpg|none]] results to !Pricing-Set-anlegen-1.jpg!none]]
          • The output encoding of the text files seems to be UTF-8. I have converted them to ISO-8859-1 with recode and then imported the pages to Confluence. Some files were imported right and some files with incorrect encoding of the umlauts - mysterious.
          1. Apr 04, 2006

            Bradley Beddoes says:

            Hi Yeah I have no doubt we are going to find little things like this that will n...

            Hi
            Yeah I have no doubt we are going to find little things like this that will need to be tweaked in the regex configuration and possibly the translator functions themselves, I don't recall confluence markup supporting alternate text for images however. I found a few situations like this where mediawiki had a feature that confluence didn't. In the majority of these cases I choose to throw the data away, which is what I am busily documenting now.

            Weird on the encoding your JVM should output the file in the encoding format you specify in that confluence.properties file. Perhaps something we can play with when the source is 'in the wild'

  9. Apr 26, 2006

    Tony McGivern says:

    Great translator Brad, thanks. Any idea when the source will be available ? And ...

    Great translator Brad, thanks.
    Any idea when the source will be available ?
    And has anyone got any experience or thoughts on how to bulk load the converted pages ?  I have about a 500 page site with lots of embedded images that doesn't seem like a good time re-attaching all those images.

    1. Jan 09, 2008

      Matt Ryall (Atlassian) says:

      The Confluence Remote API is probably your best bet. Using your favourite script...

      The Confluence Remote API is probably your best bet. Using your favourite scripting language you can pass the file contents to the addAttachment() method.

      On this site, we typically use 'attachment directories': a single page where we attach all the files for a space (example). For us it allows convenient reuse of attachments across pages, for you it could simplify the bulk-upload process.

  10. May 31, 2006

    Julie Driscoll says:

    Thanks for providing this tool. I have a site with 54 pages and many many images...

    Thanks for providing this tool. I have a site with 54 pages and many many images that I would like to transfer to Confluence.  I was able to translate the pages in a matter of seconds. Still have to read more of the threads above and figure out how to get images over. Also have to clean up some odd characters (ie., þÿ) that appear throughout the pages. And there's lot of html that appears. Perhaps I have to enable the html macro so that it is rendered as html and not text. 

  11. May 31, 2006

    archier says:

    This is a great addition, I'm wondering though if Confluence could 1) understan...

    This is a great addition, I'm wondering though if Confluence could

    1) understand mediawiki syntax by default

    2) have a mediawiki mode

    (both of these could be incomplete) 

    Mediawiki uses [[ and ]], puts images inside [[Image:...]] and uses ''' for bold text.   It seems like most of the basic mediawiki syntax is unique enough that it shouldn't trip up confluence software.   Mediawiki does have a couple of nice features -- the automatic table of contents and the automatic "floating" images.   These would require a little more fidgeting to get right but could increase confluence usability dramatically -- the Confluence-adept user base would grow by 100s of thousands overnight.

    It seems that 2) should be easy to implement, just plug this translator into the Confluence editor and add a tab labeled "Wikipedia-style markup" that invokes the translator when saving.

    1. Jun 01, 2006

      David Soul [Atlassian] says:

      Definitely a neat idea, though the difficulty of ensuring consistent round-trip ...

      Definitely a neat idea, though the difficulty of ensuring consistent round-trip conversions makes translating between the various markups and html technically demanding. While there would be significant rendering issues to be overcome, any third party efforts to contribute this functionality via a plugin would be quite welcome

      Dave

    2. May 04, 2007

      tmt says:

      I'd love it if this were possible...  We've got a departmental mediawiki wi...

      I'd love it if this were possible...  We've got a departmental mediawiki with several hundred pages on it that we're being pressured to migrate to a corporate confluence wiki.  By itself, no big deal, but our mediawiki installation leverages transclusion (and extensions like ParserFunctions) quite extensively.

      Having functional parity (at a 'we're all adults working together in the same company' every-user level, vs 'mother-may-I' admin-only level) would be a great thing ... but the ideal would be to have mediawiki syntax for transclusion supported ... maybe on a space-by-space basis only.

      So, Vote++ on mediawiki support!

  12. Jun 01, 2006

    Patrick Laverty says:

    I just installed the converter, configued the two properties files and ran ant a...

    I just installed the converter, configued the two properties files and ran ant and got:

     -bash-2.05b$ ant exec
    Buildfile: build.xml

    exec:
         [java] Exception in thread "main" java.lang.NoClassDefFoundError: while resolving class: com.intient.wiki.translator.Translator
         [java]    at java.lang.ClassLoader.resolveClass0(java.lang.Class) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.Class.initializeClass() (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.Class.forName(java.lang.String, boolean, java.lang.ClassLoader) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.Class.forName(java.lang.String) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at gnu.gcj.runtime.FirstThread.run() (/lib/ssa/libgcj.so.4.0.0)
         [java]    at _Jv_ThreadRun(java.lang.Thread) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at _Jv_RunMain(java.lang.Class, byte const, int, byte const, boolean) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at __libc_start_main (/lib/tls/libc-2.3.2.so)
         [java] Caused by: java.lang.ClassNotFoundException: java.lang.StringBuilder not found in [file:/home/plaverty/test/mw2cf-translator-0.0.3/dist/mw2cf-translator-0.0.3.jar, core:/]
         [java]    at java.net.URLClassLoader.findClass(java.lang.String) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at gnu.gcj.runtime.VMClassLoader.findClass(java.lang.String) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.ClassLoader.loadClass(java.lang.String, boolean) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at _Jv_FindClass(_Jv_Utf8Const, java.lang.ClassLoader) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at _Jv_PrepareCompiledClass(java.lang.Class) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at _Jv_WaitForState(java.lang.Class, int) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.ClassLoader.linkClass0(java.lang.Class) (/lib/ssa/libgcj.so.4.0.0)
         [java]    at java.lang.ClassLoader.resolveClass0(java.lang.Class) (/lib/ssa/libgcj.so.4.0.0)
         [java]    ...7 more
         [java] Java Result: 1

    BUILD SUCCESSFUL
    Total time: 0 seconds
    -bash-2.05b$

    Any guesses on what I'm doing wrong?  Also, we have a lot of MediaWikis, and I don't see a "pages" table in any of their databases.

     Thank you.

    1. Jul 12, 2006

      Robert Castley says:

      I too had this problem and I fixed it by making sure: 1) Using Ant 1.6.x 2) Us...

      I too had this problem and I fixed it by making sure:

      1) Using Ant 1.6.x

      2) Using Java 1.5

      Prior to my upgrades I was running Ant 1.5 and Java 1.4

      Hope this helps!

  13. Jun 15, 2006

    Peter R. says:

    Sorry, newbie question here: can I use this converter in a Windows environment, ...

    Sorry, newbie question here: can I use this converter in a Windows environment, using the Standalone version of Confluence 2.2.2? I get lost right around the "execute '$>ant exec'" part, as when I unzip the files I don't have any executables...

    1. Jul 12, 2006

      Robert Castley says:

      It does work on Windows but you need to download Ant and also make sure you are ...

      It does work on Windows but you need to download Ant and also make sure you are running Java 1.5.

  14. Aug 15, 2006

    Daniel Hannum says:

    I'm still waiting for a source release... didn't see anything in the attachments...

    I'm still waiting for a source release... didn't see anything in the attachments. My install of mediaWiki (for some reason) uses different table names. I need the code if I have any hope of doing this conversion.

    1. Aug 28, 2006

      Bradley Beddoes says:

      Hi All, I got an email from a user today about the confluence converter who info...

      Hi All,
      I got an email from a user today about the confluence converter who informed me that many of you were still finding this tool useful but there had been a few issues with it.

      Essentially after some discussions with atlassian I shelved the code while I have been working on some other more important items.

      I am starting a small group of open source developers and consultants in the next month this code was slated to be released as a project then (hopefully with donations to the cause if people find it useful).

      I will let you all know the SVN location for the code then. Until then what version of mediawiki has the differing tables?

  15. Aug 23, 2006

    Carlton Brown says:

    Table data doesn't seem to convert correctly.  For example, a MW 1.5.7 dump...

    Table data doesn't seem to convert correctly.  For example, a MW 1.5.7 dump that looks like this:

    {|
    |CMWO#
    |Platform
    |Start / End
    |Leader
    |Conf Bridge
    |Notes
    |-
    |
    |
    |
    |
    |
    |
    |-
    

    ends up geting rendered in Confluence like this:

    | CMWO# |
    | Platform |
    | Start / End |
    | Leader |
    | Conf Bridge |
    | Notes |
    | |
    | |
    | |
    | |
    | |
    | |
    

    Why would this be?  Is there any way I can pre-process the dump so your converter can read it correctly?  Or does the converter need to be updated?

    1. Aug 28, 2006

      Bradley Beddoes says:

      Hi Carlton, Unfortunately what you have is yet another way of doing tables Whic...

      Hi Carlton,
      Unfortunately what you have is yet another way of doing tables Which is part of the problem with mediawiki in the first place, multiple ways to achieve the same outcomes.

      Presently the regex works on assuming rows are actually represented as rows, so in your example:

      Unknown macro: {| |CMWO# |Platform | Start / End | Leader | Conf Bridge | Notes |- | | | | | | |- |}

      Is considered to be valid syntax. We may be able to work together on some extra code to process tables in this way once the source is out I will keep you updated.

    2. Aug 28, 2006

      Bradley Beddoes says:

      Forgot the noformat tags: {| |CMWO# |Platform | Start / End | Leader | Conf Br...

      Forgot the noformat tags:

      {|
      |CMWO# |Platform | Start / End | Leader | Conf Bridge | Notes
      |-
      | | | | | |
      |-
      |}
      
  16. Sep 12, 2006

    Miles Duke says:

    What's the story on the source code release? We're about to convert 2800 pages,...

    What's the story on the source code release?

    We're about to convert 2800 pages, and it works pretty nicely. However, anything we can do to improve the fidelity of the translation will be worthwhile from our perspective. Here are some patterns we observed repeatedly:

    1. two extra question-marks ('?') at the beginning of several pages, sometimes fouling up the Wiki markup for the first heading. Some times, this would show up as plain question-marks, other times it would come up as graphic characters:
      • ??h1.
      • ��
    2. More issues with tables. This time, the translation of the following source format made a row for each field, and the bold markups came out in plain text:
    |-
    | style="border-bottom:2px solid black;" |<b>Requirement</b>
    | style="border-bottom:2px solid black;" |<b>Status</b>
    | style="border-bottom:2px solid black;" |<b>Time Estimate</b>
    | style="border-bottom:2px solid black;" |<b>Owner</b>
    | style="border-bottom:2px solid black;" |<b>Other Information</b>
    |-
    
    |-style="background: #33ff33"
    |Prepare for work for this iteration
    |100%
    |16 Hours
    |Charley
    |Install New SDK, Study Docs
    |-
    

    In any case, we'd like to know one way or the other if the source code will become available. We're already looking at a substantial investment to reparent all of our pages into the appropriate spaces, and reattach our icon files.

  17. Sep 16, 2006

    Bradley Beddoes says:

    All, Source and binaries released on intient svn server as promised. http://int...

    All,
    Source and binaries released on intient svn server as promised.

    http://intient.com/wiki/index.php/Mediawiki_to_Confluence_Translator

    Bradley

  18. Aug 21, 2008

    Kane Wang says:

    http://intient.com/wiki/index.php/Mediawiki_to_Confluence_Translator&nbsp;is...
  19. Jun 04

    Hellmut Adolphs says:

    Is there a working link to find the importer anywhere? thanks

    Is there a working link to find the importer anywhere?

    thanks