UWC PmWiki Notes

Status

Very usable, still debugging for client.

Pre-processing the PmWiki data files

The UWC operates on the raw wiki PmWiki data files which contain the wiki text and markup, but with PmWiki a lot of meta-data is stored at the top of those files. This needs to be stripped out. The following script can do this.

mkdir   /tmp/wiki-text;

for i in /var/www/html/wiki/wiki.d/*;

do

            echo $i;

grep text= $i | sed -e 's/^text=//' | tr ² '\n' > /tmp/wiki-text/`basename $i`.txt;

done
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Aug 31, 2006

    Carlton Brown says:

    I tried using this today to import a PMWiki into Confluence and it didn't work a...

    I tried using this today to import a PMWiki into Confluence and it didn't work at all  Issues -

    1)  Only imported 18 of 275 pages

    2)  Page names all truncated (in an inconsistent fashion)

    3)  After importing, the content in a given page is the content for some other page

     I am kind of desperate to get this to work, how can I go about troubleshooting or getting some attention on this?

    1. Aug 31, 2006

      Jonathan Nolen says:

      Hi Carlton, Unfortunately, Brendan (the developer responsible for the UWC) is o...

      Hi Carlton,

      Unfortunately, Brendan (the developer responsible for the UWC) is out of the office for three weeks. I'm going to try to get in touch with him to answer your questions, but we may not have immediate answers. We'll try to get you working as quickly as we can. In the meantime, you might also try posting to the COnfluence forums (http://forms.atlassian.com0 and asking if anyone there has successfully used the PMWiki Converter.

      Cheers,
      Jonathan

      1. Aug 31, 2006

        Carlton Brown says:

        Jon, thanks for the response.   I identified some of the problems and ...

        Jon, thanks for the response.   I identified some of the problems and was able to hack together a workaround script, so I can wait until Brendan gets back.   Here's the areas of concern I've identified:

        1)  Don't know the format that UCW expects the PMWiki data to be in (raw PMWiki data doesn't work).   Worked around by extracting the text field from the raw file.   This needs to be documented.

        2)  Converter doesn't handle articles with non-alphanumeric filenames as far as I can tell.   Worked around by munging the file names and link names.  Don't know if it's a bug or a documentation issue.

        3)  Would be nice to have some documentation that the html macros have to be enabled in order for some of the codes to work.    Documentation issue.

        Thanks again for your help. 

        1. Sep 06, 2006

          Brendan Patterson says:

          Hi Carlton, Thanks for your interest in Confluence. I'm still on paternity le...

          Hi Carlton,

          Thanks for your interest in Confluence.

          I'm still on paternity leave but here are some quick answers.

          1) You're correct. This is a key point I forgot to add to the documentation. Basically all of the PmWiki meta-data in the files is stripped out at the top. The attachment meta-data is still used. Here is a script someone wrote to do this (obviously don't run this on your actual files but rather copy them all to another location)

          mkdir   /tmp/wiki-text;
          
          for i in /var/www/html/wiki/wiki.d/*;
          
          do
          
                      echo $i;
          
          grep text= $i | sed -e 's/^text=//' | tr ² '\n' > /tmp/wiki-text/`basename $i`.txt;
          
          done

          2) Ah you are correct. I haven't seen that before. What kind of non-alphanumeric filenames are you using? A script to go through and rename those is probably a good idea as you mention later.

          3) Good point about the enabling the html macro. I'll add that to the documentation.

          1. Sep 06, 2006

            Carlton Brown says:

            Brandon, thanks for replying.  Here's my responses, and some other things I...

            Brandon, thanks for replying.  Here's my responses, and some other things I've learned since I posted originally.

            1)  Regarding the non-alphanumeric filenames - UWC really needs to support all the page name characters supported by both PMWiki and Confluence.   At an absolute minimum this means converting underscores to spaces (PMWiki like most file-based wikis supports spaces as page names but stores them on disk as underscores.   Other common symbols are hyphen and period.   Very important to handle the spaces, at least, because if you don't then that means a lot of links in a lot of pages have to be munged.

            2)  It also came to my attention that link-relabeling doesn't get imported correctly, that is if I have a link like [[The displayed text|TheActualPage]], then it would be imported as a link labeled "TheActualPage" pointing to a link called "The displayed text" 

            3)  On the formatting....

            I'm not well-versed in the PMWiki format but from what I've seen so far, that sed fragment seems rather naive, unless the converter makes use of data like:

            version=pmwiki-2.1.14 ordered=1 urlencoded=1
            agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/1.5.0.5
            host=127.0.0.1
            name=MyArticle
            rev=1

            When I ran the importer, all that stuff showed up in the page.   So if I had simply just removed the text= string, you're saying the converter would have imported it as Confluence metadata rather than page text?

            Thanks for your attention; I will look for your response whenever you get time.

            1. Sep 28, 2006

              Brendan Patterson says:

              It appears that you're using version 2.1.14 of the PmWiki where as the converter...

              It appears that you're using version 2.1.14 of the PmWiki where as the converter I wrote was for 2.0beta37. I wonder if that could cause some of the discrepencies.

              Also the example you give of [[The displayed text]] is turned around. Have a look at http://www.pmwiki.org/wiki/PmWiki/Links The way PmWiki behaves means your example should read [[TheActualPage]] though Confluence does work [The displayed text]

              I'm not sure if that's where some confusion stemmed from or it was just a typo.

              See my other comments below.

        2. Sep 28, 2006

          Brendan Patterson says:

          re: 3) I just added to the F.A.Q. the fact that the \ macro is used for some c...

          re: 3)

          I just added to the F.A.Q. the fact that the {html} macro is used for some conversions. Thanks! And of course the nice thing about trading these notes here is that this now serves as documentation as well.

  2. Aug 31, 2006

    Carlton Brown says:

    Also... what format is this expected to be in?  PMWiki doesn't mention anyt...

    Also... what format is this expected to be in?  PMWiki doesn't mention anything about exporting in its docs, so I would assume UCW expects the raw data files.  But in the few pages that do import, I see literal PMWiki metadata in all of them.  So, color me confused.

  3. Aug 31, 2006

    Carlton Brown says:

    I've identified that the converter has some issues with unexpected article forma...

    I've identified that the converter has some issues with unexpected article format names (i.e. it does not seem to play well with non-alphanumeric characters).  Also, it seems to have expected that the PMWiki source files will have been pre-processed or exported in some way, but as I mentioned above, your docs aren't clear what that might be. 

    In particular I found I had to strip out a number of %0a strings.

     Also, for some odd reason, the converter converts PMWiki indents to the string

    Unknown macro: {html}
    which is rendered thusly in Confluence:

  4. Sep 28, 2006

    Brendan Patterson says:

    There is a new version 26 of the UWC which fixes many of the issues with links. ...

    There is a new version 26 of the UWC which fixes many of the issues with links. I rewrote everything regarding how links and attachments are handled.

    As far as having spaces supported and other non-alphanumerics I'm not quite sure what the issue is, I don't doubt that you're seeing one, but here is my understanding............

    [[a link like this]]  really becomes [[ALinkLikeThis]]  with the like named file on 
    the server Main.ALinkLikeThis.txt   So spaces are something that the PmWiki parser 
    strips out, they only exist in the wiki text.
    
    This converter takes [[a link like this]] from PmWiki and turns it into [Main.ALinkLikeThis] 
    on Confluence so everything is consistent. The 'Main' will be replaced by the appropriate 
    PmWiki group if that page is part of another group.

    I'm not sure where underscores come into play.

    Perhaps you're using a different version of PmWiki where things behave differently than described as above? This conversion targets version 2.0Beta37

  5. Mar 06

    Jordan Wosnick says:

    All \\ I am wondering what the current status of PMWiki conversion to Confluence...

    All -- I am wondering what the current status of PMWiki conversion to Confluence is as of today (Feb 2008). We are currently using PMWiki to run a local Wiki, but the organization as a whole is migrating to Confluence shortly. I've tried running the Universal Wiki Converter on some files locally but am unable to get it to work (I can't find the output?) I'm in an all-Windows environment here and I can't run the batch script listed above.

    Anyone have step-by-step instructions to convert a large body of PMWiki pages to Confluence as painlessly as possible?

    Thanks.

    1. Mar 12

      Brendan Patterson says:

      Hi, The current state of the PmWiki converter is that it should work reasonably...

      Hi,

      The current state of the PmWiki converter is that it should work reasonably well. I ran quite a large conversion with about 7000 pages and several hundred attachments last year.

      There was an issue with the UWC where the run_uwc.bat file was missing a library preventing content being sent to Confluence, but that is now fixed.

      I'd suggest the following.

      1. Mar 13

        Jordan Wosnick says:

        Brendan \\ the video really helps, thanks very much. Finding the space key was k...

        Brendan -- the video really helps, thanks very much. Finding the space key was key (no pun intended) for me. The UWC is now uploading files. However, the markup is still problematic. I'll need to find a Linux box (or figure out Cygwin) to run the batch script provided above, but in the meantime, exactly what has to be stripped from the PmWiki file in order for it to be correctly handled by the UWC?

        1. Mar 13

          Brendan Patterson says:

          Thanks for the feedback. I'm very glad the video helped out. I also noticed th...

          Thanks for the feedback. I'm very glad the video helped out.

          I also noticed that the field called 'Space' might be an issue and that has been updated to Space Key for the next release.

          As far as what's listed above I 'think' that the script is stripping out all the lines starting with 'text='. I didn't write the script myself nor run it.

          I wish you didn't have to go to so much trouble to run the script. But another option would be to grab the VMWare Player and then grab one of the free 'virtual appliances' which are preconfigured computers essentially that run for you inside of Windows. Maybe one like this?

          When doing this you can access your local drive from the virtual appliance but you might have to specify something to do so. Or you can mount a shared network drive.

          1. Mar 14

            Jordan Wosnick says:

            Thanks Brendan \\ actually, on one of the other fora here I was recommended to t...

            Thanks Brendan -- actually, on one of the other fora here I was recommended to try Cygwin -- it rejects the script, but I'll try to work through that.

            In the meantime, I tried just extracting the PmWiki markup alone (from the Edit window of a simple page), pasting it into a text file, saving it, and running that through the UWC.

            The input (PmWiki markup, sitting alone in a text file called "AnotherPage"):

            This is the other page linked to from the [[test page]]. 


            What the resulting page on Confluence looks like, after running it through the UWC:

            This is the other page linked to from the _UWC_LINK_START_test page_UWC_LINK_END_.

            The output from the UWC Feedback Window is as follows:

             Converting Wiki: pmwiki
            Initializing Converters...
            Initializing Pages...
            Converting pages...
            Initializing Converters...
            Checking for illegal pagenames.
            Initializing Converters...
            Checking for links to illegal pagenames.
            Saving Pages to Filesystem
            Uploading Pages to Confluence...
            Uploaded 1 out of 1 page.
            CONVERTER_ERROR Exception thrown by converter PmWiki.4000-Link_converter.class on page AnotherPage. Continuing with next converter.
            
            ENCOUNTERED ERRORS - See uwc.log for more details

            The relevant entry in uwc.log looks like this:

             
            2008-03-14 09:58:16,845 INFO  [Thread-7] - Starting conversion.
            2008-03-14 09:58:16,845 INFO  [Thread-7] - Initializing Converters...
            2008-03-14 09:58:16,845 INFO  [Thread-7] - Initializing Pages...
            2008-03-14 09:58:16,860 INFO  [Thread-7] - Converting pages...
            2008-03-14 09:58:16,860 INFO  [Thread-7] - -------------------------------------
            2008-03-14 09:58:16,860 INFO  [Thread-7] - converting page file: AnotherPage
            2008-03-14 09:58:16,860 INFO  [Thread-7] - ::: total attachments found: 0
            2008-03-14 09:58:16,860 INFO  [Thread-7] - ::: total attachments NOT found: 0
            2008-03-14 09:58:16,860 ERROR [Thread-7] - Exception thrown by converter PmWiki.4000-Link_converter.class on page AnotherPage. Continuing with next converter.
            java.lang.StringIndexOutOfBoundsException: String index out of range: -1
                at java.lang.String.substring(Unknown Source)
                at com.atlassian.uwc.converters.PmWikiLinkAdjuster.prependWithGroupName(PmWikiLinkAdjuster.java:71)
                at com.atlassian.uwc.converters.PmWikiLinkAdjuster.convert(PmWikiLinkAdjuster.java:41)
                at com.atlassian.uwc.ui.ConverterEngine.convertPage(ConverterEngine.java:915)
                at com.atlassian.uwc.ui.ConverterEngine.convertPages(ConverterEngine.java:805)
                at com.atlassian.uwc.ui.ConverterEngine.convertPages(ConverterEngine.java:768)
                at com.atlassian.uwc.ui.ConverterEngine.convert(ConverterEngine.java:329)
                at com.atlassian.uwc.ui.ConverterEngine.convert(ConverterEngine.java:302)
                at com.atlassian.uwc.ui.ConverterEngine.convert(ConverterEngine.java:176)
                at com.atlassian.uwc.ui.UWCGuiModel.convert(UWCGuiModel.java:171)
                at com.atlassian.uwc.ui.listeners.ConvertListener$Worker.construct(ConvertListener.java:277)
                at com.atlassian.uwc.ui.SwingWorker$2.run(SwingWorker.java:110)
                at java.lang.Thread.run(Unknown Source)
            2008-03-14 09:58:16,860 INFO  [Thread-7] -                    time to convert 0ms
            2008-03-14 09:58:16,860 INFO  [Thread-7] - ::: total time to convert files: 0 seconds.
            2008-03-14 09:58:16,860 INFO  [Thread-7] - Initializing Converters...
            2008-03-14 09:58:16,860 INFO  [Thread-7] - Checking for illegal pagenames.
            2008-03-14 09:58:16,860 INFO  [Thread-7] - -------------------------------------
            2008-03-14 09:58:16,860 INFO  [Thread-7] - converting page file: AnotherPage
            2008-03-14 09:58:16,860 INFO  [Thread-7] - Converting Illegal Page Names - start
            2008-03-14 09:58:16,876 INFO  [Thread-7] - Converting Illegal Page Names - complete
            2008-03-14 09:58:16,876 INFO  [Thread-7] -                    time to convert 16ms
            2008-03-14 09:58:16,876 INFO  [Thread-7] - ::: total time to convert files: 0 seconds.
            2008-03-14 09:58:16,876 INFO  [Thread-7] - Initializing Converters...
            2008-03-14 09:58:16,876 INFO  [Thread-7] - Checking for links to illegal pagenames.
            2008-03-14 09:58:16,876 INFO  [Thread-7] - -------------------------------------
            2008-03-14 09:58:16,876 INFO  [Thread-7] - converting page file: AnotherPage
            2008-03-14 09:58:16,876 INFO  [Thread-7] - Converting Links Referencing Illegal Names - start
            2008-03-14 09:58:16,892 INFO  [Thread-7] - Converting Links Referencing Illegal Names - complete
            2008-03-14 09:58:16,892 INFO  [Thread-7] -                    time to convert 16ms
            2008-03-14 09:58:16,892 INFO  [Thread-7] - ::: total time to convert files: 0 seconds.
            2008-03-14 09:58:16,892 INFO  [Thread-7] - Saving Pages to Filesystem
            2008-03-14 09:58:16,892 INFO  [Thread-7] - Uploading Pages to Confluence...
            2008-03-14 09:58:17,079 INFO  [Thread-7] - UWC connected successfully with Confluence.
            2008-03-14 09:58:17,142 INFO  [Thread-7] - page added may already exist
            2008-03-14 09:58:17,251 INFO  [Thread-7] - Uploaded 1 out of 1 page.
            2008-03-14 09:58:17,251 INFO  [Thread-7] - Conversion Complete
            2008-03-14 09:58:17,251 ERROR [Thread-7] -
            Conversion Status... UNEXPECTED_ERROR
             

            Any ideas what's going on?

            Also, the UWC seems to break when a page has a name of the type Group.Page (standard PMWiki naming format). It tries to make two pages named "Group".

            1. Mar 14

              Brendan Patterson says:

              Is it just that page that you're getting that result or many pages? I would try...

              Is it just that page that you're getting that result or many pages?

              I would try checking out the source code using Subversion:
              svn co http://svn.atlassian.com/svn/public/contrib/confluence/universal-wiki-converter

              and then convert the sample file I have set up at this location (just as I did in the video)
              universal-wiki-converter/devel/sampleData/pmwiki/Main.PmWikiSampleFile.txt

              And see if that works. If it does there might be something specific about your file that the
              com.atlassian.uwc.converters.PmWikiLinkAdjuster is just not handling properly.

              I wish I had time to debug it but unfortunately don't at this juncture. You could also hire an Atlassian partner to help you with the conversion if that makes sense for your situation.

              1. Mar 18

                Jordan Wosnick says:

                I've sorted out the bug. It seems that the UWC requires PmWiki input file names ...

                I've sorted out the bug. It seems that the UWC requires PmWiki input file names to be in the format GroupName.PageName.Extension. PmWiki does not normally store files with an extension, so one has to be added (it doesn't seem to matter what the extension is). If there is no extension, the UWC interprets GroupName as the page name and PageName as the extension.

                Failure to have filenames in the GroupName.PageName.Extension format seems to cause the UWC to break all links in the pages themselves. 

                1. Mar 19

                  Brendan Patterson says:

                  Thanks for working it out and posting an update.

                  Thanks for working it out and posting an update.

  6. Apr 24

    Jordan Wosnick says:

    I'm running up against something very basic in the way the UWC handles links to ...

    I'm running up against something very basic in the way the UWC handles links to external websites.

    Feeding the UWC a simple PMWiki page with two links, as follows:

    [[http://www.google.com]]
    [[https://www.google.com]]

    yields Confluence markup that looks like this:

    [http_-www.google.com]
    
    [https_-www.google.com]

    Obviously, the outgoing links are destroyed in this conversion, but I'm getting this problem only in some contexts (which I can't figure out yet). Has anyone else experienced this?

    1. Apr 28

      Brendan Patterson says:

      Hi Jordan, I did my best to have a quick look at this. The two converter relat...

      Hi Jordan,

      I did my best to have a quick look at this.

      The two converter related to this conversion are in
      conf/converter.pmwiki.properties
      they are:
      PmWiki.0680-LinkToHTTPorToFTP.java-regex-tokenize
      and
      PmWiki.4000-Link_converter.class

      I'm not sure what might have changed since I developed this. I am seeing your same results.

      I tried it on windows and mac with the same results.

      I wish I could debug this for you right now but it will take a couple of hours yet. I'll have to schedule it for next week or so.

      1. May 15

        Brendan Patterson says:

        Ack. I'm completely snowed under with work at the moment. I have this on my task...

        Ack. I'm completely snowed under with work at the moment. I have this on my task list but not sure when I'll be able to get to it.

        Alternatively you could engage an Atlassian partner to help with your conversion.

  7. May 01

    Jordan Wosnick says:

    Yet another comment... something else I've noticed about the UWC is that it trie...

    Yet another comment... something else I've noticed about the UWC is that it tries to convert anything in CamelCase (i.e. InitialCapitalsSeparatingWords) into a link, even if it is not itself linked in the PMWiki markup. Company names (e.g. WestJet, NanoSolar, etc.) get turned into "fake" links inadvertently. Is this the intended behaviour?

    1. May 15

      Brendan Patterson says:

      Hi Jordan, There are two considerations here: 1) Yes. I believe it is interpre...

      Hi Jordan,

      There are two considerations here:

      1)
      Yes. I believe it is interpreting camel cases as links because I think that's what PmWiki does too unless you have it set to behave differently.

      However you can easily disable that functionality.

      You can do that by commenting out the following two lines in the file conf/converter.pmwiki.properties
      PmWiki.0770-LinkMatchForCamelCaseWithDot.java-regex-tokenize
      PmWiki.0780-LinkMatchForCamelCase.java-regex-tokenize

      Just put a # in front of the lines and restart the UWC

      2)
      Also you can turn camel case links on and off in Confluence.
      Administration -> General Configureation -> Camel Case Links

      Hope that helps.