Using XSLT with Bad HTML

We have a PHP CMS with a lot of poorly written HTML in the client–contributed content. This kept causing my XSL template system to output XML errors. I got around this problem by:

  1. wrapping content in CDATA tags
  2. Checking if the content is valid XML with xml_parse() in PHP, if not I add a CDATA tag and try again.
  3. Strip out bad characters that may have crept in from Word
  4. Process the XSL and XML using xsl:value-of tags with disable-output-escaping="yes"

Using CDATA tags around unpredictable HTML helps prevent problems with the XML parser. Without the final step, the resulting HTML contains the original HTML with HTML entities.

In PHP, mb_convert_encoding($string, 'ASCII') has proven very useful for handling text users paste from applications like Word. PHP has to be compiled with —enable–mbstring for this function to work. It prevents strings with different encodings encodings to your XSL from confusing the XML parser (where the encoding is defined).

Read More →

Processing Textile Text with XSLT

Textile provides a simple way of writing human–friendly text that can easily be translated to XHTML. HTML tags are simplified into a set of phrase and block modifiers; even tables and attributes can be created.

I was looking at the PHP code for this and wondering if I could create an XSL file that could translate similar text into XHTML. I created some XML to contain my text:


This should be in *bold*.


And then used the following recursive algorithm to process it in XSLT:

Read More →

Processessing Multiple XML Files with XSLT in PHP

I often use a message class for a lot of things with PHP. This allows me to build messages to display to the user for errors, successes and general feedback.

Classes for each of these three things are created in CSS to display the information, and then I use a simple PHP class for appending messages to a arrays.

I wanted to include this in some XML data I was processing with XSLT, but I found it difficult to combine multiple XML files with PHP’s XSLT processor. However, in PHP4, you can do it like this:

Read More →