Files: 8e0d7e643c647b11a09fa74cf75bd78b828004c6 / content / weblog035.md
title: "The Twaddlebot has been unleashed" date: 2004-06-07 18:40 status: published tags: Mooquackwooftweetmeow, The Twaddle, The Twaddlebot, XSL, XML, the Web
<p>
Last night version 1.0 of <a href="http://www.thetwaddle.co.uk/">The Twaddle</a> went live. It uses arbitrary <abbr title="Extensible Markup Language">XML</abbr> and <abbr title="Extensible Stylesheet Language Transformations">XSLT</abbr> to generate valid <abbr title="Extensible Hypertext Markup Language">XHTML</abbr> pages... offline.
</p>
<p>
The idea of uploading bare-bones articles and an XSLT template, allowing the browser to generate pages as they're required, was <a href="/weblog028">a no-go</a>. But I managed to rig up the transformation offline, to be run as a batch.
</p>
<p>
Following the tradition of giving XML languages names that are barely-logical acronyms beginning with <q>X</q>, I call the language <abbr title="XML... Twaddle... something">XTw</abbr>, which stands for <q>XML... Twaddle... something</q>.
</p>
<p>
Here's how I worked the magic (borrowing liberally from a newsgroup posting I made on the subject):
</p>
<p>
This assumes: no programming experience, but enough computer savvy to create XML and XSL files to need transforming in the first place; and a Windows (XP) machine)
</p>
<p>
First off, you'll need Xalan, available from http://xml.apache.org/xalan-j/ (and the requisite Java runtime, which you probably already have)
</p>
<p>
The actual file I downloaded was http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.tar.gz
</p>
<p>
There's also http://apache.rmplc.co.uk/dist/xml/xalan-j/xalan-j-current-bin.zip if you prefer a zip.
</p>
<p>
The version I got was 2.6.0 (the Java version).
</p>
<p>
Unzip Xalan into a folder. I used C:\Program Files\xalan-j_2_6_0
</p>
<p>
Now the code from http://evc-cit.info/cit041x/batchfiles.html#transform:
</p>
<p>
<code>echo off
<br>java -cp h:\java\xmljar\xalan-j_2_5_1\bin\xml-apis.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xercesImpl.jar;h:\java\xmljar\xalan-j_2_5_1\bin\xalan.jar;. org.apache.xalan.xslt.Process -IN %1 -XSL %2 -OUT %3 %4 %5 %6 %7 %8 %9</code>
</p>
<p>
The only line break should be after <q>echo off</q>.
</p>
<p>
Copy this into a plain text editor (e.g. Notepad), and save it as filename.bat (I used ANSI encoding, if it matters)
</p>
<p>
You should now have an MS-DOS Batch File.
</p>
<p>
(Apparently some versions of Notepad append <q>.txt</q> to filenames, even if they contain a file <q>extension</q>. In these cases, quoting the filename - e.g. “filename.bat” - allegedly solves the problem)
</p>
<p>
You'll most likely have to modify the code to point to the actual locations of your Xalan installation and files.
</p>
<p>
I only plan on using one XSL stylesheet with multiple files; the input files will be filename.xml. The output files will be filename.htm and will be kept in the folder above the one where the input and XSL files are kept. So, I modified the code a little:
</p>
<p>
<code>java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xml -XSL "c:\path\to\an\xsl\file\xsl.xml" -OUT ..\%1.htm</code>
</p>
<p>
This should all be on one line. <q>%1</q> in the code will be replaced by the first argument passed to the batch file, <q>%2</q> by the second argument, etc. <q>..\</q> means <q>up one folder</q>. The quotation marks around the filenames cause them to be treated as one item, despite their containing spaces.
</p>
<p>
You can add <q>@echo off</q> (without quotes) in an empty line above, if you prefer not to have masses of textual output in the command console. e.g.:
</p>
<p>
<code>@echo off
<br>java -cp "c:...</code>
</p>
<p>
<q>echo off</q> turns off the display of subsequent commands; <q>@</q> hides the echo off command.
</p>
<p>
To perform the transformation, open a command console (Start > Run > <code>"cmd"</code>) and navigate to the location of your XML, XSL and batch files, by typing
</p>
<p>
<code>cd "c:\path\to\files"</code>
</p>
<p>
(including the quotes)
</p>
<p>
For simplicity's sake, I've shoved everything in the same folder, and used absolute paths for the programs. You could probably also mess around with relative paths or the path environment variable, but I can't be bothered.
</p>
<p>
I ended up having to use <a href="http://tidy.sourceforge.net/">HTML Tidy</a> to contort the output into valid XHTML. My final batch file reads:
</p>
<p>
<code>java -cp "c:\program files\xalan-j_2_6_0\bin\xml-apis.jar";"c:\program files\xalan-j_2_6_0\bin\xercesImpl.jar";"c:\program files\xalan-j_2_6_0\bin\xalan.jar";. org.apache.xalan.xslt.Process -IN %1.xtw -XSL "XTw2XHTML.xsl" -OUT ..\thetwaddle\%1.htm
<br>
<br>"C:\Program Files\HTMLTidy\tidy.exe" -q -m -c --show-warnings no --output-xml yes --output-xhtml yes -latin1 --doctype strict --tidy-mark no --wrap 0 --ascii-chars no --drop-proprietary-attributes yes --fix-bad-comments no ..\thetwaddle\%1.htm
<br>
<br>echo Done %1.</code>
</p>
<p>
(Line breaks have been doubled for clarity.)
</p>
<p>
The input XML files are all labelled <q>filename.xtw</q>; the XSL stylesheet is <q>XTw2XHTML.xsl</q>, and the output files are cacked into the folder <q>thetwaddle</q>, a sibling of the folder where the batch file lives, and assigned a suffix of <q>.htm</q>.
</p>
<p>
Those options shown for Tidy are the result of trial and error, or rather, trial and testing and reading Tidy's <a href="http://tidy.sourceforge.net/docs/quickref.html">Quick Reference</a> - no warranty implied. The <q>echo</q> command prints out a message for each finished file.
</p>
<p>
This batch file is wrapped up in another one, which repeatedly calls the first, thus:
</p>
<p>
<code>@echo off
<br>echo Transforming XTw into XHTML...
<br>call xtw2xhtml afile
<br>call xtw2xhtml otherfiles
<br>echo Done.</code>
</p>
<p>
The text output is just to make the command console more interesting while the batch program is running. It also helps pinpoint any errors, such as typos, which show up as blobs of text in the command console.
</p>
<p>
The result of all this fiddling is that I can change pages' contents more easily; I've been able to, fairly easily, implement a few minor changes that would have taken effort before. The final product lives <a href="http://www.thetwaddle.co.uk/">here</a>.
</p>
<p>
In semi-related news, it turns out that PURLs such as <a href="http://purl.org/mooquackwooftweetmeow">purl.org/mooquackwooftweetmeow</a>, without the trailing slash, are possible - it's just partial redirects that have to end with slashes. The Twaddle's now on PURLs, too - <a href="http://purl.org/thetwaddle/">purl.org/thetwaddle</a> - with or without the slash.
</p>
<p>
While uploading “Unleash The Twaddlebot!” (The Twaddle v1.0), I was reminded that we're approaching the 50-file limit; that's not including styles, which are kept in a separate account. This means we'll probably have to change hosts.
</p>
<p>
Fortunately, ntl provide 55 megabytes of space, so I'm planning to shift everything there. This shouldn't be too troublesome now that everything's on PURLs.
</p>
Built with git-ssb-web