--- title: > I've Got A Brand New Content Management System And I'll Give You The Key date: 2004-07-19 02:15 status: published description: > The Mooquackwhatnotbot, what it does, how it works and what it ate for breakfast. tags: Mooquackwooftweetmeow, Mooquackwhatnotbot links: - url: http://gkn.me.uk/weblog035 title: > The Twaddlebot has been unleashed description: > About the Twaddlebot, Mooquackwhatnotbot's older, simpler brother rel: related type: text/html - url: http://purl.org/thetwaddle/ title: > El Twad description: > The Twaddle, produce of the Twaddlebot rel: related type: aplication/xhtml+xml ---
...if you ask nicely. Mooquackwooftweetmeow is now, like The Twaddle, generated from arbitrary XML using batch XSLT. If that's completely foreign to you, the rest probably will be (but I'll try my best).
At El Twad, everything is done in nice, distinct articles - there are no piddly little entries like here. So for The Twaddle, each article can be kept in its own file. These files are then each passed through the same XSLT template/filter, resulting in similarly structured pages, each with different content. The filter uses a little class
- and id
-based trickery to implement the differences between the front page, the Articles page, the other admin pages and the articles; these amount to a different set of meta-blurb around the outside of the main content.
Here, however, it'd be hopelessly impractical - or rather, inconvenient - to have a separate file for each entry. Especially when an entry may be no more that a couple of lines. So everything must live in one big file - mqwtm.xml
, The Big, Bad Source File. If I have one source file and want to create many pages (which would be nice), I need many XSLT filters.
The problem then becomes How do I generate many filters, one for each entry?
. The only difference there needs to be between these filters is the id
of the entry it will apply to - the layout will not differ between entries.
XSL allows one to define variables which can be used later in the XSLT template. So, the entry's id
, i.e. the entryid
is defined as an XSL variable which, surprisingly enough, I label entryid
.
Later on in the template then, I can instruct the XSL transformer to concern itself with the entry whose
. This means I can instruct it to display id
value is the entry with which we are currently dealing
the
, current
entry's titlethe
, etc.
current
entry's body text
So, the only difference between each entry's XSL stylesheet/template (henceforth referred to as an entrysheet
) need be one string of text, once. But how does one go about that?
To perform the batch XSL transformation, I'm using Xalan-C, the precompiled C++ version of Xalan; I was using Xalan-J, the Java version, for the Twaddlebot, as I had correctly assumed that a Java program would run anywhere
. I'd also incorrectly assumed that I'd need some special software developers' tools to run the C++ version. In fact, the C++ version is simpler to use, in my (intentionally limited) experience, and much quicker - which is handy, as a lot of XSL transformations are required to pull this off.
Unfortunately, I know of no way to pass a variable to Xalan - to say, for example, Transform
. But I do know how to pass a variable to an MS-DOS batch program. Yup - those things.
mqwtm.xml
using bigbadxsltemplate.xsl
, with the keyword cheese
.
The end result is incredibly hacky. The XSL stylesheet to be applied to all entries, to transform them from arbitrary, homebrew XML into world-famous XHTML, is kept in two chunks - the bit before the entryid
needs to be inserted, and the bit after. I then have an MS-DOS batch file which copies the start of the template to a file, appends the name of the current entry (passed to it as a variable), then appends the remainder of the template. Hacktastic.
Next problem: I must pass the name of each entry, in turn, to the aforementioned batch program (entrysheetcompiler.bat
). Solution: createentrysheetcompiler.xsl
. Yes, XSLT can be used to rustle up text documents as well as XML documents. And those text documents can happen to form batch commands when read by MS-DOS. And they do. This XSL template, when run on The Big, Bad Source File, transforms each entry into a command which says run
. This is done using the magic of entrysheetcompiler.bat
with the variable [the entry's title]<xsl:for-each/>
- go on, look it up - you know you want to.
Applying this transformation to The Big, Bad Source File creates an MS-DOS batch program which runs entrysheetcompiler.bat
on said Big, Bad Source File, using each entry's title in turn as a keyword/variable/parameter.
Now, I have a stylesheet for each entry; each one will transform The Big, Bad Source File into a page displaying that entry. Let's do that then. I use the method I used to create entrysheetcompiler.bat
, to create a batch that will transform each entry, using its template, into a page; I entitle that batch
.
entrytransformer.bat
createentrytransformer.xsl
, the template used to make entrytransformer.bat
, also contains a few lines to transform the Front Page, the Archive and the Atom feed. They are transformed using separate, hand-made XSL templates for each, designed to be run on - you guessed it - The Big, Bad Source File. The Front Page's template, the Archive's template, and the master template for entries, are all fairly similar (hence the consistent visual style); I could use another level of back-end automation to generate all three from a common source, but at the moment I maintain them separately and manually.
The XHTML which is cacked out by entrytransformer.bat
isn't the cleanest it might be, so I need to run HTML Tidy on every file. Same process - use <xsl:for-each/>
(have you looked it up yet? it's very useful) to, for each entry in the source, write a batch command which performs a set action on many files, whose filenames differ only by containing a different entry's id
and published date (all of which can be readily extracted from TBBSF).
And that's basically it. All that's needed now is to run each of those components in order. I have a master batch file to do that, the Mooquackwhatnotbot.
First it runs createentrysheetcompiler.xsl
, createentrytransformer.xsl
and createxhtmlcleaner.xsl
on TBBSF. This results in three new batch programs, entrysheetcompiler.bat
, entrytransformer.bat
and xhtmlcleaner.bat
. These are then run in turn: entrysheetcompiler.bat
generates a template for each entry, entrytransformer.bat
transforms TBBSF using each of those templates in turn (that's what takes most of the time), then xhtmlcleaner.bat
tidies up the output. The entrysheets and the three generated batch programs are then no longer needed - they'll be generated anew next time - so they are deleted.
I also back up The Big, Bad Source File at the start of each Mooquackwhatnotbot-ing session, in case I accidentally do something daft that erases it. This is done by another batch program, which simply copies mqwtm.xml
to a location given by the last time TBBSF was updated; this batch program is generated from another (comparitively simple) XSLT template.
The whole process takes around five minutes on a good day, most of the time being taken up transforming the entries into pages. Xalan must be phoning home
to check that the markup I've used is actually the valid XHTML is says it is - the process takes longer when my connection is clogged, and the text-mode transformation used to make the batch files is relatively instantaneous.
Static, manually-updated stuff like the CSS stylesheets (they're applied by your web browser when you view a page and are uploaded with the page to the web - not to be confused with XSL stylesheets, which I apply before uploading) and images used in entries and throughout the site, is kept separately from auto-generated stuff on my computer. This way I know what I can delete safe in the knowledge that it'll be regenerated next time I run the Whatnotbot. When uploaded, everything intersperses, keeping the images and other gubbins with the appropriate pages.
Over at El Twad, I need to maintain separately a central list of articles, in order to tell the Twaddlebot what to transform - this is the drawback of using a decentralised storage system (read: separate files
) for articles. I also need to manually maintain the list of articles to put on the main menu; I've managed to combine that list with the Articles page's, but it's not ideal and it's certainly not automatic. That's not too much of a problem, as articles at El Twad are much more of an event than entries are here. It just adds a little more work per item. The beauty of this system is that I just add the entry to The Big, Bad Source File, push the button, wait for the Whatnotbot to work its magic, then upload.
Incidentally, I use FileZilla to FTP everything up to my mum's ntl webspace - that, and the fact that I'm already well over FreeWebs' 50-file limit, is why Mooquackwhatnot is no longer at FreeWebs. Besides, FTP is just faster and cooler.
The benefits of keeping everything in one file are quite cool; for example, those Recent Entries there, in an intentionally vague relative position due to multiple-stylability, are made possible because of it. I think it's possible to draw in content from other XML documents, but it's easier to (i.e. I know how to) draw in content from the same document. Those Recent Entries use logic like: for each entry, sorted by the date it was updated, in reverse chronological order, if an entry is within the first five, show its title, last updated date, etc.; if it's also the first one, show the first paragraph as well.
.
This definitely couldn't be done without a central list of entries, and is much, much easier if all the entries are actually kept in the same document so Xalan, the XSL transformer, can look at them all at the same time. Those Featured and Perpetual Links only exist once in The Big, Bad Source File, but are propogated to every page (where they're needed).
If I want to completely overhaul the structure of the site (which I shouldn't need to - I've tried to use the most semantic structure possible) I'd only have to change three files - one for entries, one for the Front Page, and one for the Archive. It's not one file, but it could be a lot worse. I can change small details of every page quite easily.
This method is quite portable - I'm sure the batch programs could be ported to a Unix format. But more importantly, the end result requires absolutely no browser-side XSL (bloody Opera), no PHP, no SQL, no ASP, no Perl, definitely no .NET, and not even any JavaScript. This would work on GeoCities (if they didn't insist on attaching an HTML-invalid advert).
I haven't uploaded any samples of the Mooquackwhatnotbot, not out of principle or anything, but because you'd probably struggle to make sense of even the complete thing (I know I do). If you fancy a peek, I can send you all the gubbins and a sample Smaller, Not-Quite-As-Bad Source File - just email me.
I haven't made any mention of the actual resultant site, except to demonstrate its method of construction. This is intentional - I'll discuss the site's structure and styling (i.e. XHTML and CSS) in a forthcoming entry.