When is XHTML not XHTML?

Well you may think it’s just a matter of rendering some well-formed markup and setting your doctype…


<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>


right?  Wrong!


Different browsers render in different ways.  What’s common however is that unless your response’s content type is “application/xhtml+xml” then you may not pass go and pick up $200.  You can force the browser to recognise the content type in two ways (in addition to the DOCTYPE declaration above):



  1. Name your file with an .xhtml extension (not really a solution as IE says ‘what?’)
  2. Set Response.ContentType = “application/xhtml+xml” (See here for more info)

The DOCTYPE is really an additional information item and not something modern browsers do much with on its own.  So how different are the browsers’  XHTML implementations?


Internet Explorer for instance takes the ‘all comers’ approach and will do the best it can given the markup, whilst degrading gracefully if it encounters errors.  great! I hear you say – saves me from actually testing this thing!


Firefox on the other hand will use a completely different parser once it knows you want to serve well-formed markup.  The good thing about this is that you immediately see any errors and the page won’t render if there’s invalid markup.  IE meanwhile continues to let you believe you’re an XHTML master (note: it doesn’t recognise the xhtml extension).



Firefox has some good resources about supported features.  A common one that catches people out is Javascript’s document.write.  This isn’t allowed in XHTML as the string input can’t be guaranteed to be valid XML. 


Safari’s not quite as advanced yet as Firefox’s support, but it too will properly validate your markup and report errors.


If you really want to stick your neck out then place a link on your site to validate against the W3C’s standards.  You’re likely to get plenty of errors – like this page!


Other things to watch out for are Content Management systems that allow you to enter non-compliant html in text editors, and simply not having JavaScript blocks in CDATA sections).


 

Efficient XPath Queries

This is something I get asked about quite a bit as I had a misspent youth with XSLT…


One of my pet hates is people always using the search identifier ‘//’ in XPath queries.  It’s the SELECT * of XML, and shouldn’t be used unless you actually want to ‘search’ your document.


If you’re performing SQL you’d SELECT fields explicitly rather than SELECT * wouldn’t you? 🙂


because:



  1. If the schema changes (new fields inserted) then your existing code has less chance of breaking
  2. It performs better less server pre-processing and catalog lookups
  3. More declarative and the code is easy to read and maintain

With XML (and the standard DOM-style parsers) you’re working on a document tree, and accessing nodes loaded into that tree. 


Consider the following XML fragment as an example:


Say your car dealership sells new cars and current prices are serialised in an xml document:


<xml>
 <Sales>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
</xml>


In order to get all cars you can easily use the following XPath: ‘//Car’.  This searches from the root to find all Car elements (and finds 2).


A more efficient way would be ‘/*/*/Car’ as we know Cars only exist at the 3rd level in the document


A yet more efficient way would be ‘/Sales/Cars/Car’ as this makes a direct hit on the depth and parent elements in the document.


You can also mix and match with ‘/*//Car’ to directly access the parts of the DOM you’re certain of and search on the parts you’re not.


Now lets say you go into the used car business and refactor your XML format as follows:


<xml>
 <Sales Type=”New”>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
 <Sales Type=”Used”>
  <Cars>
   <Car Make=”Honda” Model=”Civic” />
   <Car Make=”Mazda” Model=”3″ />
  </Cars>
 </Sales>
</xml>


If you want to get all Cars (new and used) you could still use any of the XPaths above.  If you want to isolate the New from the used, then you’re going to have to make some changes in any case.


‘//Car’ is obviously going to pick up 4 elements


‘/Sales[@Type=’New’]/Cars/Car’ is probably the most efficient in this case but it will vary based on the complexity of the condition (in []) and the complexity and size of the document.


It’s important to note that the effects of optimising your XPath queries won’t really be felt until you’re operating with:



  • Large documents (n mb+)
  • Deep documents (n levels deep) – n is variable based on the document size 
  • Heavy load and high demand for throughput of requests

This means don’t expect effecient XPaths to solve all your problems, but they shouldn’t be a limiting factor in a business application.  The other thing to say is that if your XPath queries are getting really complicated then your schema is probably in need of some attention as well.