Efficient XPath Queries

This is something I get asked about quite a bit as I had a misspent youth with XSLT…

One of my pet hates is people always using the search identifier ‘//’ in XPath queries.  It’s the SELECT * of XML, and shouldn’t be used unless you actually want to ‘search’ your document.

If you’re performing SQL you’d SELECT fields explicitly rather than SELECT * wouldn’t you? 🙂


  1. If the schema changes (new fields inserted) then your existing code has less chance of breaking
  2. It performs better less server pre-processing and catalog lookups
  3. More declarative and the code is easy to read and maintain

With XML (and the standard DOM-style parsers) you’re working on a document tree, and accessing nodes loaded into that tree. 

Consider the following XML fragment as an example:

Say your car dealership sells new cars and current prices are serialised in an xml document:

   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />

In order to get all cars you can easily use the following XPath: ‘//Car’.  This searches from the root to find all Car elements (and finds 2).

A more efficient way would be ‘/*/*/Car’ as we know Cars only exist at the 3rd level in the document

A yet more efficient way would be ‘/Sales/Cars/Car’ as this makes a direct hit on the depth and parent elements in the document.

You can also mix and match with ‘/*//Car’ to directly access the parts of the DOM you’re certain of and search on the parts you’re not.

Now lets say you go into the used car business and refactor your XML format as follows:

 <Sales Type=”New”>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
 <Sales Type=”Used”>
   <Car Make=”Honda” Model=”Civic” />
   <Car Make=”Mazda” Model=”3″ />

If you want to get all Cars (new and used) you could still use any of the XPaths above.  If you want to isolate the New from the used, then you’re going to have to make some changes in any case.

‘//Car’ is obviously going to pick up 4 elements

‘/Sales[@Type=’New’]/Cars/Car’ is probably the most efficient in this case but it will vary based on the complexity of the condition (in []) and the complexity and size of the document.

It’s important to note that the effects of optimising your XPath queries won’t really be felt until you’re operating with:

  • Large documents (n mb+)
  • Deep documents (n levels deep) – n is variable based on the document size 
  • Heavy load and high demand for throughput of requests

This means don’t expect effecient XPaths to solve all your problems, but they shouldn’t be a limiting factor in a business application.  The other thing to say is that if your XPath queries are getting really complicated then your schema is probably in need of some attention as well.

SQL Server – Save all DTS Packages

Here’s another useful proc I found to back up all your DTS packages (from I can’t remember where).  Put this in your master database: 

Create procedure usp_SavePackages


set nocount on

@PackageName varchar(128
declare @rc 
@ServerName varchar(128
declare @FileName varchar(128
declare    @FilePath varchar(128
declare    @cmd varchar(2000
select     @ServerName 
@@ServerName ,
if right(@Path,1) <> 
        select @Path = @Path + ‘

    — create output directory – will fail if already exists but …
    select    @cmd = ‘
‘ + @FilePath
    exec master..xp_cmdshell @cmd
create table #packages (PackageName varchar(128))
    insert     #packages
    select     distinct name
    from    msdb..sysdtspackages
    select    @PackageName = ”
    while @PackageName < (select max(PackageName) from #packages)
        select    @PackageName = min(PackageName) from #packages where PackageName > @PackageName

        select    @FileName = @FilePath + @PackageName + ‘

        exec @rc = sp_OACreate ‘
‘, @objPackage output
        if @rc <> 0
failed to create package rc %d
‘, 16, -1, @rc)

        exec @rc = sp_OAMethod @objPackage, ‘
‘ , null,
            @ServerName = @ServerName, @Flags = 256, @PackageName = @PackageName
        if @rc <> 0
failed to load package rc %d, package %s
‘, 16, -1, @rc, @PackageName)
        — delete old file
        select @cmd = ‘
‘ + @FileName
        exec master..xp_cmdshell @cmd, no_output
        exec @rc = sp_OAMethod @objPackage, ‘
‘, null, @FileName
        if @rc <> 0
failed to save package rc %d, package %s
‘, 16, -1, @rc, @PackageName)
        exec @rc = sp_OADestroy @objPackage


Colorized by: CarlosAg.CodeColorizer

Microsoft CRM – Data Migration Framework (DMF) Bugs / Fixes

I’ve been migrating data for a client recently and part of this has required the use of the MS Data Migration Framework (DMF).  I thought I’d post a couple of bugs I found and(thankfully) how I managed to get past them with the use of SQL Profiler!

The Migration Framework works – albeit slowly as it uses the CRM Web Service to stuff the records into the CRM database.  The CRM structure I’ve been trying to migrate to contains a large number of custom fields.  The DMF calls many stored procs to get its data (from the CDF database); these are structured by entity e.g


I was having a problem with Contacts, and after tracing a migration attempt with cut-down data (for speed), I found that the issue was that the lid was being blown off a VARCHAR(8000) in p_dm_GetContact giving the following SQL Error:

Error: 207, Severity: 16, State: 3

The error is ‘Invalid Column Name’.  Further investigation revealed that a slightly dodgy piece of code (replicated in all the other similar procs) effectively truncates the dynamically created field list in the ‘SELECT’ statement…

— if there are extended fields, then attached them to the end 
—   of the base field list

if len(@metadatafields) > 0
    set @entityfields = @entityfields + ‘, ‘ + rtrim(@metadatafields)

@entityfields contains a comma delimited list of fields from the ‘cdf_Contact’ table, and @metadatafields contains fields from ‘cdf_Contact_Ext’ (custom fields).  If you concatenate the two you’re obviously asking for trouble.

The following shows the ‘fix’ to enable a bit more room (it’s still not very good – but as the code’s not available we can’t make a nicer fix right now)

CREATE procedure p_dm_GetContact
    set nocount on
@entity            varchar(100
declare @entityfields     varchar(8000
declare @metadatafields varchar(8000
— set target entity name
set @entity 

— query base field list for targeted entity
exec p_dm_GetFields @entity, ‘c’, @entityfields 

— query extended field list for targeted entity
exec p_dm_GetMetaDataFields @entity, ‘ext’, @metadatafields 

— if there are fields, then attached them to the end 
        —   of the base field list    
if len(@entityfields) > 
set @entityfields ‘, ‘ rtrim

— if there are extended fields, then attached them to the end 
        —   of the base field list
——    if len(@metadatafields) > 0
——        set @entityfields = @entityfields + ‘, ‘ + rtrim(@metadatafields)

if len(@metadatafields) > 0
set @metadatafields ‘, ‘ rtrim

— build/execute query statement
    — the status code must be null
exec (
‘    select top 1 ”” As CRMUpdateStatus, 
            c.ContactId As entityid,
            c.StateCode As statecode,
            dbo.p_fn_GetID(l.DestinationId) As originatingleadid, 
            dbo.p_fn_GetID(p.DestinationId) As defaultpriceLevelid, 
            dbo.p_fn_GetID(a.DestinationId) As accountid, 
            dbo.p_fn_GetID(su.DestinationId) As owninguser 
+ @entityfields + @metadatafields + 

                 from cdf_Contact c

       inner join cdf_Contact_info info 
                   on c.ContactId = info.ContactId

       left outer join cdf_Contact_ext ext 
                   on c.ContactId = ext.ContactId

      left outer join cdf_Lead_info l 
                   on c.OriginatingLeadId = l.LeadId

      left outer join cdf_PriceLevel_info p 
                   on c.DefaultPriceLevelId = p.PriceLevelId

      left outer join cdf_Account_info a 
                   on c.AccountId = a.AccountId

           inner join cdf_SystemUser_info su 
                   on c.OwningUser = su.SystemUserId

           where (info.StatusCode is null)
         and (info.DestinationId is null)’

Colorized by: CarlosAg.CodeColorizer


SQL Server – Get Matrix for two different result sets

I had a need today to splice 2 different result sets together and effectively give all permutations of the two tables.  I’d seen different examples on doing this with cursors and loops etc, but I thought – there ‘must’ be an easier way – there’s lots of different join types right?

The first issue I’d got is that there’s no common fields between the two tables I need to merge because otherwise a ‘simple’ FULL OUTER JOIN would do the biz.

In the end it didn’t amount to much more than that because a slightly dodgy hack (but functional nontheless), to guarantee full coverage from the join is to simply place the same literal on both sides of the join 🙂

With 2 sets of ‘different’ data (let’s use Northwind as I’m not about to bore you with my own data).  Say you wanted to generate a list of all products for all regions given the following result sets…

SELECT ProductName
FROM Products

SELECT RegionDescription
FROM Region

You can splice them together with this….

SELECT  R.RegionDescription,
FROM Products P
    ON 1 = 1

and you get something like this… (snippet)

RegionDescription                                  ProductName                             
————————————————– —————————————-
Eastern                                            Valkoinen suklaa
Eastern                                            Vegie-spread
Eastern                                            Wimmers gute Semmelknödel
Eastern                                            Zaanse koeken
Western                                            Alice Mutton
Western                                            Aniseed Syrup
Western                                            Boston Crab Meat

If there’s actually a common field then your join can obviously use that.  Any other filtering you want to do can also go into the WHERE clause as required.

This is another reminder of the lesson that if you’re tempted to use a CURSOR, then resist that urge until you’ve exhausted your Google finger!

SQLServerCentral and SQLMag often have the answer you’re looking for.

Cryptography – Using Base64 to encrypt/decrypt Strings rather than Unicode or ASCII – Error – ‘Length of the data to decrypt is invalid’

I recently did a conversion from .NET 1.1 to .NET 2.0 for a particular project.  The framework classes were still in 1.1 (where our TripleDES encryption library lives).  Unit Tests still run in 1.1 and all pass.

A production problem then started to show ‘Length of the data to decrypt is invalid’ and I was horribly confused as I’d inherited some of the code and all seemed good…

I thankfully found the following … http://blogs.msdn.com/shawnfa/archive/2005/11/10/491431.aspx that explained why it’s really not a good idea to assume that even though you’re only encrypting ASCII characters, you don’t use 7 or 8 bit encoding to encrypt/decrypt.  The key is in the fact that the overall sequence of bytes isn’t guaranteed to be valid Unicode or ASCII.

(Why do the unit tests pass?) – because the Cryptography classes in .NET framework were revamped for V2.0 and validation tightened up.  As Shawn says – it’s better that it doesn’t successfully decrypt into an invalid string.

This leaves a bit of a tidyup of course as I’ve now got to re-stuff all encrypted data into the database and patch the apps to ensure that the correct encryption is used.  I’ve also got to find a way to support existing files and string encrypted with the class as some code is still happily using this in 1.1-land and some is clearly ‘not’ working in 2.0-land – fun!

I guess it’s a lesson that today’s code won’t necessarily work tomorrow – and you shouldn’t discount breakages from framework changes when you’re investigating issues.