git ssb

Files: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f / content / brandnewcms.md

13627 bytesRaw

title: > I've Got A Brand New Content Management System And I'll Give You The Key date: 2004-07-19 02:15 status: published description: > The Mooquackwhatnotbot, what it does, how it works and what it ate for breakfast.

tags: Mooquackwooftweetmeow, Mooquackwhatnotbot

links:

url: http://gkn.me.uk/weblog035
title: > The Twaddlebot has been unleashed description: > About the Twaddlebot, Mooquackwhatnotbot's older, simpler brother rel: related type: text/html
url: http://purl.org/thetwaddle/
title: > El Twad description: > The Twaddle, produce of the Twaddlebot rel: related type: aplication/xhtml+xml

...if you ask nicely. Mooquackwooftweetmeow is now, like The Twaddle, generated from arbitrary <abbr title="Extensible Markup Language">XML</abbr> using batch <abbr title="Extensible Stylesheet Language Transformations">XSLT</abbr>. If that's completely foreign to you, the rest probably will be (but I'll try my best). 

At <a href="http://www.thetwaddle.co.uk/">El Twad</a>, everything is done in nice, distinct articles - there are no piddly little entries like here. So for The Twaddle, each article can be kept in its own file. These files are then each passed through the same XSLT template/filter, resulting in similarly structured pages, each with different content. The filter uses a little <code>class</code>- and <code>id</code>-based trickery to implement the differences between the front page, the Articles page, the other admin pages and the articles; these amount to a different set of meta-blurb around the outside of the main content. 

Here, however, it'd be hopelessly impractical - or rather, inconvenient - to have a separate file for each entry. Especially when an entry may be no more that a couple of lines. So everything must live in one big file - <code>mqwtm.xml</code>, The Big, Bad Source File. If I have one source file and want to create many pages (which would be nice), I need many XSLT filters. 

<a href="/brandnewcms/filters.png" title="View the diagram alone."><img src="/brandnewcms/filters.png" alt="El Twad uses one content file for each article, and one layout file, the same for every article; the result is one page for each article. Mooquackwhatnot, on the other hand, uses one content file containing many entries; and one layout file for each entry, which references only that entry; the result is still one page for each entry."></a>


The problem then becomes “How do I generate many filters, one for each entry?”. The only difference there needs to be between these filters is the <code>id</code> of the entry it will apply to - the layout will not differ between entries. 

XSL allows one to define variables which can be used later in the XSLT template. So, the entry's <code>id</code>, i.e. the <code>entryid</code> is defined as an XSL variable which, surprisingly enough, I label <code>entryid</code>. 

Later on in the template then, I can instruct the XSL transformer to concern itself with “the entry whose <code>id</code> value is <var>the entry with which we are currently dealing</var>”. This means I can instruct it to display “the <var>current</var> entry's title”, “the <var>current</var> entry's body text”, etc. 

So, the only difference between each entry's XSL stylesheet/template (henceforth referred to as an “entrysheet”) need be one string of text, once. But how does one go about that? 

To perform the batch XSL transformation, I'm using Xalan-C, the precompiled C++ version of Xalan; I was using Xalan-J, the Java version, for the Twaddlebot, as I had correctly assumed that a Java program would “run anywhere”. I'd also incorrectly assumed that I'd need some special software developers' tools to run the C++ version. In fact, the C++ version is simpler to use, in my (intentionally limited) experience, and much quicker - which is handy, as a lot of XSL transformations are required to pull this off. 

Unfortunately, I know of no way to pass a variable to Xalan - to say, for example, “Transform <code>mqwtm.xml</code> using <code>bigbadxsltemplate.xsl</code>, with the keyword ‘cheese’.”. But I do know how to pass a variable to an MS-DOS batch program. Yup - those things. 

The end result is incredibly hacky. The XSL stylesheet to be applied to all entries, to transform them from arbitrary, homebrew XML into world-famous <abbr title="Extensible Hypertext Markup Language">XHTML</abbr>, is kept in two chunks - the bit before the <code>entryid</code> needs to be inserted, and the bit after. I then have an MS-DOS batch file which copies the start of the template to a file, appends the name of the current entry (passed to it as a variable), then appends the remainder of the template. Hacktastic. 

Next problem: I must pass the name of each entry, in turn, to the aforementioned batch program (<code>entrysheetcompiler.bat</code>). Solution: <code>createentrysheetcompiler.xsl</code>. Yes, XSLT can be used to rustle up text documents as well as XML documents. And those text documents can happen to form batch commands when read by MS-DOS. And they do. This XSL template, when run on The Big, Bad Source File, transforms each entry into a command which says “run <code>entrysheetcompiler.bat</code> with the variable [the entry's title]”. This is done using the magic of <code><xsl:for-each/></code> - go on, look it up - you know you want to. 

Applying this transformation to The Big, Bad Source File creates an MS-DOS batch program which runs <code>entrysheetcompiler.bat</code> on said Big, Bad Source File, using each entry's title in turn as a keyword/variable/parameter. 

Now, I have a stylesheet for each entry; each one will transform The Big, Bad Source File into a page displaying that entry. Let's do that then. I use the method I used to create <code>entrysheetcompiler.bat</code>, to create a batch that will transform each entry, using its template, into a page; I entitle that batch “<code>entrytransformer.bat</code>”. 

<code>createentrytransformer.xsl</code>, the template used to make <code>entrytransformer.bat</code>, also contains a few lines to transform the Front Page, the Archive and the Atom feed. They are transformed using separate, hand-made XSL templates for each, designed to be run on - you guessed it - The Big, Bad Source File. The Front Page's template, the Archive's template, and the master template for entries, are all fairly similar (hence the consistent visual style); I could use another level of back-end automation to generate all three from a common source, but at the moment I maintain them separately and manually. 

The XHTML which is cacked out by <code>entrytransformer.bat</code> isn't the cleanest it might be, so I need to run <a href="http://tidy.sourceforge.net/" title="A utility to clean up HTML and XHTML">HTML Tidy</a> on every file. Same process - use <code><xsl:for-each/></code> (have you looked it up yet? it's very useful) to, for each entry in the source, write a batch command which performs a set action on many files, whose filenames differ only by containing a different entry's <code>id</code> and published date (all of which can be readily extracted from <abbr title="The Big, Bad Source File - I got sick of typing it">TBBSF</abbr>). 

And that's basically it. All that's needed now is to run each of those components in order. I have a master batch file to do that, the Mooquackwhatnotbot. 

First it runs <code>createentrysheetcompiler.xsl</code>, <code>createentrytransformer.xsl</code> and <code>createxhtmlcleaner.xsl</code> on TBBSF. This results in three new batch programs, <code>entrysheetcompiler.bat</code>, <code>entrytransformer.bat</code> and <code>xhtmlcleaner.bat</code>. These are then run in turn: <code>entrysheetcompiler.bat</code> generates a template for each entry, <code>entrytransformer.bat</code> transforms TBBSF using each of those templates in turn (that's what takes most of the time), then <code>xhtmlcleaner.bat</code> tidies up the output. The entrysheets and the three generated batch programs are then no longer needed - they'll be generated anew next time - so they are deleted. 

I also back up The Big, Bad Source File at the start of each Mooquackwhatnotbot-ing session, in case I accidentally do something daft that erases it. This is done by another batch program, which simply copies <code>mqwtm.xml</code> to a location given by the last time TBBSF was updated; this batch program is generated from another (comparitively simple) XSLT template. 

The whole process takes around five minutes on a good day, most of the time being taken up transforming the entries into pages. Xalan must be “phoning home” to check that the markup I've used is actually the valid XHTML is says it is - the process takes longer when my connection is clogged, and the text-mode transformation used to make the batch files is relatively instantaneous. 

Static, manually-updated stuff like the CSS stylesheets (they're applied by your web browser when you view a page and are uploaded with the page to the web - not to be confused with XSL stylesheets, which I apply before uploading) and images used in entries and throughout the site, is kept separately from auto-generated stuff on my computer. This way I know what I can delete safe in the knowledge that it'll be regenerated next time I run the Whatnotbot. When uploaded, everything intersperses, keeping the images and other gubbins with the appropriate pages. 

Over at El Twad, I need to maintain separately a central list of articles, in order to tell the Twaddlebot what to transform - this is the drawback of using a decentralised storage system (read: “separate files”) for articles. I also need to manually maintain the list of articles to put on the main menu; I've managed to combine that list with the Articles page's, but it's not ideal and it's certainly not automatic. That's not too much of a problem, as articles at El Twad are much more of an event than entries are here. It just adds a little more work per item. The beauty of this system is that I just add the entry to The Big, Bad Source File, push the button, wait for the Whatnotbot to work its magic, then upload. 

Incidentally, I use <a href="http://filezilla.sf.net" title="Open source FTP client, not related to Mozilla">FileZilla</a> to FTP everything up to my mum's ntl webspace - that, and the fact that I'm already well over FreeWebs' 50-file limit, is why Mooquackwhatnot is no longer at FreeWebs. Besides, FTP is just faster and cooler. 

The benefits of keeping everything in one file are quite cool; for example, those Recent Entries there, in an intentionally vague relative position due to <a href="/propermultiplestyleage" title="Proper Multiple Styleage">multiple-stylability</a>, are made possible because of it. I think it's possible to draw in content from other XML documents, but it's easier to (i.e. I know how to) draw in content from the same document. Those Recent Entries use logic like: “for each entry, sorted by the date it was updated, in reverse chronological order, if an entry is within the first five, show its title, last updated date, etc.; if it's also the first one, show the first paragraph as well.”. 

This definitely couldn't be done without a central list of entries, and is much, much easier if all the entries are actually kept in the same document so Xalan, the XSL transformer, can look at them all at the same time. Those Featured and Perpetual Links only exist once in The Big, Bad Source File, but are propogated to every page (where they're needed). 

If I want to completely overhaul the structure of the site (which I shouldn't need to - I've tried to use the most semantic structure possible) I'd only have to change three files - one for entries, one for the Front Page, and one for the Archive. It's not one file, but it could be a lot worse. I can change small details of every page quite easily. 

This method is quite portable - I'm sure the batch programs could be ported to a Unix format. But more importantly, the end result requires absolutely no browser-side XSL (<a href="/weblog027" title="XSL + Opera = Eugh">bloody Opera</a>), no <abbr title="PHP Hypertext Processing - that's right, PHP stands for itself">PHP</abbr>, no <acronym title="Some... Queer Language? No bloody idea what it stands for, but it's fancy">SQL</acronym>, no <acronym title="Active Server Pages - finally, a proper acronym!">ASP</acronym>, no Perl, definitely no .NET, and not even any JavaScript. This would work on GeoCities (if they didn't insist on attaching an HTML-invalid advert). 

I haven't uploaded any samples of the Mooquackwhatnotbot, not out of principle or anything, but because you'd probably struggle to make sense of even the complete thing (I know I do). If you fancy a peek, I can send you all the gubbins and a sample Smaller, Not-Quite-As-Bad Source File - just email me. 

I haven't made any mention of the actual resultant site, except to demonstrate its method of construction. This is intentional - I'll discuss the site's structure and styling (i.e. XHTML and CSS) in a forthcoming entry.

Built with git-ssb-web

Grey the earthling / gkn.me.uk

Tree: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f

Files: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f / content / brandnewcms.md

Grey the earthling / gkn.me.uk

Tree: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f main <input type="submit" value="Go"/>

Files: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f / content / brandnewcms.md

Tree: dbc49e1d77f4e79a1eae73e1331a73d9ecc5586f