Efficient XPath Queries

This is something I get asked about quite a bit as I had a misspent youth with XSLT…


One of my pet hates is people always using the search identifier ‘//’ in XPath queries.  It’s the SELECT * of XML, and shouldn’t be used unless you actually want to ‘search’ your document.


If you’re performing SQL you’d SELECT fields explicitly rather than SELECT * wouldn’t you? 🙂


because:



  1. If the schema changes (new fields inserted) then your existing code has less chance of breaking
  2. It performs better less server pre-processing and catalog lookups
  3. More declarative and the code is easy to read and maintain

With XML (and the standard DOM-style parsers) you’re working on a document tree, and accessing nodes loaded into that tree. 


Consider the following XML fragment as an example:


Say your car dealership sells new cars and current prices are serialised in an xml document:


<xml>
 <Sales>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
</xml>


In order to get all cars you can easily use the following XPath: ‘//Car’.  This searches from the root to find all Car elements (and finds 2).


A more efficient way would be ‘/*/*/Car’ as we know Cars only exist at the 3rd level in the document


A yet more efficient way would be ‘/Sales/Cars/Car’ as this makes a direct hit on the depth and parent elements in the document.


You can also mix and match with ‘/*//Car’ to directly access the parts of the DOM you’re certain of and search on the parts you’re not.


Now lets say you go into the used car business and refactor your XML format as follows:


<xml>
 <Sales Type=”New”>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
 <Sales Type=”Used”>
  <Cars>
   <Car Make=”Honda” Model=”Civic” />
   <Car Make=”Mazda” Model=”3″ />
  </Cars>
 </Sales>
</xml>


If you want to get all Cars (new and used) you could still use any of the XPaths above.  If you want to isolate the New from the used, then you’re going to have to make some changes in any case.


‘//Car’ is obviously going to pick up 4 elements


‘/Sales[@Type=’New’]/Cars/Car’ is probably the most efficient in this case but it will vary based on the complexity of the condition (in []) and the complexity and size of the document.


It’s important to note that the effects of optimising your XPath queries won’t really be felt until you’re operating with:



  • Large documents (n mb+)
  • Deep documents (n levels deep) – n is variable based on the document size 
  • Heavy load and high demand for throughput of requests

This means don’t expect effecient XPaths to solve all your problems, but they shouldn’t be a limiting factor in a business application.  The other thing to say is that if your XPath queries are getting really complicated then your schema is probably in need of some attention as well.

SQL Server – Save all DTS Packages

Here’s another useful proc I found to back up all your DTS packages (from I can’t remember where).  Put this in your master database: 


Create procedure usp_SavePackages
@Path    
varchar(128
)
as
/*

*/

    
set nocount on

declare 
@objPackage 
int
declare 
@PackageName varchar(128
)
declare @rc 
int
declare 
@ServerName varchar(128
)
declare @FileName varchar(128
)
declare    @FilePath varchar(128
)
declare    @cmd varchar(2000
)
    
    
select     @ServerName 
@@ServerName ,
        @FilePath 
@Path
    
    
if right(@Path,1) <> 
‘\’
    begin
        select @Path = @Path + ‘
\

    end
    
    — create output directory – will fail if already exists but …
    select    @cmd = ‘
mkdir 
‘ + @FilePath
    exec master..xp_cmdshell @cmd
    
    
create table #packages (PackageName varchar(128))
    insert     #packages
        (PackageName)
    select     distinct name
    from    msdb..sysdtspackages
    
    select    @PackageName = ”
    while @PackageName < (select max(PackageName) from #packages)
    begin
        select    @PackageName = min(PackageName) from #packages where PackageName > @PackageName

        select    @FileName = @FilePath + @PackageName + ‘
.dts


        exec @rc = sp_OACreate ‘
DTS.Package
‘, @objPackage output
        if @rc <> 0
        begin
            raiserror(‘
failed to create package rc %d
‘, 16, -1, @rc)
            return
        end

        exec @rc = sp_OAMethod @objPackage, ‘
LoadFromSQLServer
‘ , null,
            @ServerName = @ServerName, @Flags = 256, @PackageName = @PackageName
        if @rc <> 0
        begin
            raiserror(‘
failed to load package rc %d, package %s
‘, 16, -1, @rc, @PackageName)
            return
        end
        
        — delete old file
        select @cmd = ‘
del 
‘ + @FileName
        exec master..xp_cmdshell @cmd, no_output
        
        exec @rc = sp_OAMethod @objPackage, ‘
SaveToStorageFile
‘, null, @FileName
        if @rc <> 0
        begin
            raiserror(‘
failed to save package rc %d, package %s
‘, 16, -1, @rc, @PackageName)
            return
        end
        
        exec @rc = sp_OADestroy @objPackage
    end

GO


Colorized by: CarlosAg.CodeColorizer

Microsoft CRM – Data Migration Framework (DMF) Bugs / Fixes

I’ve been migrating data for a client recently and part of this has required the use of the MS Data Migration Framework (DMF).  I thought I’d post a couple of bugs I found and(thankfully) how I managed to get past them with the use of SQL Profiler!


The Migration Framework works – albeit slowly as it uses the CRM Web Service to stuff the records into the CRM database.  The CRM structure I’ve been trying to migrate to contains a large number of custom fields.  The DMF calls many stored procs to get its data (from the CDF database); these are structured by entity e.g


p_dm_GetContact
p_dm_GetContract


I was having a problem with Contacts, and after tracing a migration attempt with cut-down data (for speed), I found that the issue was that the lid was being blown off a VARCHAR(8000) in p_dm_GetContact giving the following SQL Error:


Error: 207, Severity: 16, State: 3


The error is ‘Invalid Column Name’.  Further investigation revealed that a slightly dodgy piece of code (replicated in all the other similar procs) effectively truncates the dynamically created field list in the ‘SELECT’ statement…


— if there are extended fields, then attached them to the end 
—   of the base field list

if len(@metadatafields) > 0
    set @entityfields = @entityfields + ‘, ‘ + rtrim(@metadatafields)


@entityfields contains a comma delimited list of fields from the ‘cdf_Contact’ table, and @metadatafields contains fields from ‘cdf_Contact_Ext’ (custom fields).  If you concatenate the two you’re obviously asking for trouble.


The following shows the ‘fix’ to enable a bit more room (it’s still not very good – but as the code’s not available we can’t make a nicer fix right now)


CREATE procedure p_dm_GetContact
as
begin
    set nocount on
    
    declare 
@entity            varchar(100
)
    
declare @entityfields     varchar(8000
)
    
declare @metadatafields varchar(8000
)
    
— set target entity name
    
set @entity 
‘Contact’

    
— query base field list for targeted entity
    
exec p_dm_GetFields @entity, ‘c’, @entityfields 
output

    
— query extended field list for targeted entity
    
exec p_dm_GetMetaDataFields @entity, ‘ext’, @metadatafields 
output

    
— if there are fields, then attached them to the end 
        —   of the base field list    
    
if len(@entityfields) > 
0
       
set @entityfields ‘, ‘ rtrim
(@entityfields)

    
— if there are extended fields, then attached them to the end 
        —   of the base field list
——    if len(@metadatafields) > 0
——        set @entityfields = @entityfields + ‘, ‘ + rtrim(@metadatafields)

    
if len(@metadatafields) > 0
        
set @metadatafields ‘, ‘ rtrim
(@metadatafields)

    
— build/execute query statement
    — the status code must be null
    
exec (
‘    select top 1 ”” As CRMUpdateStatus, 
            c.ContactId As entityid,
            c.StateCode As statecode,
            dbo.p_fn_GetID(l.DestinationId) As originatingleadid, 
            dbo.p_fn_GetID(p.DestinationId) As defaultpriceLevelid, 
            dbo.p_fn_GetID(a.DestinationId) As accountid, 
            dbo.p_fn_GetID(su.DestinationId) As owninguser 
            ‘ 
+ @entityfields + @metadatafields + 


                 from cdf_Contact c

       inner join cdf_Contact_info info 
                   on c.ContactId = info.ContactId

       left outer join cdf_Contact_ext ext 
                   on c.ContactId = ext.ContactId

      left outer join cdf_Lead_info l 
                   on c.OriginatingLeadId = l.LeadId

      left outer join cdf_PriceLevel_info p 
                   on c.DefaultPriceLevelId = p.PriceLevelId

      left outer join cdf_Account_info a 
                   on c.AccountId = a.AccountId

           inner join cdf_SystemUser_info su 
                   on c.OwningUser = su.SystemUserId

           where (info.StatusCode is null)
         and (info.DestinationId is null)’
)
end
GO



Colorized by: CarlosAg.CodeColorizer


 

SQL Server – Get Matrix for two different result sets

I had a need today to splice 2 different result sets together and effectively give all permutations of the two tables.  I’d seen different examples on doing this with cursors and loops etc, but I thought – there ‘must’ be an easier way – there’s lots of different join types right?


The first issue I’d got is that there’s no common fields between the two tables I need to merge because otherwise a ‘simple’ FULL OUTER JOIN would do the biz.


In the end it didn’t amount to much more than that because a slightly dodgy hack (but functional nontheless), to guarantee full coverage from the join is to simply place the same literal on both sides of the join 🙂


With 2 sets of ‘different’ data (let’s use Northwind as I’m not about to bore you with my own data).  Say you wanted to generate a list of all products for all regions given the following result sets…


SELECT ProductName
FROM Products


SELECT RegionDescription
FROM Region


You can splice them together with this….


SELECT  R.RegionDescription,
        P.ProductName
FROM Products P
FULL OUTER JOIN Region R
    ON 1 = 1


and you get something like this… (snippet)


RegionDescription                                  ProductName                             
————————————————– —————————————-
Eastern                                            Valkoinen suklaa
Eastern                                            Vegie-spread
Eastern                                            Wimmers gute Semmelknödel
Eastern                                            Zaanse koeken
Western                                            Alice Mutton
Western                                            Aniseed Syrup
Western                                            Boston Crab Meat


If there’s actually a common field then your join can obviously use that.  Any other filtering you want to do can also go into the WHERE clause as required.


This is another reminder of the lesson that if you’re tempted to use a CURSOR, then resist that urge until you’ve exhausted your Google finger!


SQLServerCentral and SQLMag often have the answer you’re looking for.

Cryptography – Using Base64 to encrypt/decrypt Strings rather than Unicode or ASCII – Error – ‘Length of the data to decrypt is invalid’

I recently did a conversion from .NET 1.1 to .NET 2.0 for a particular project.  The framework classes were still in 1.1 (where our TripleDES encryption library lives).  Unit Tests still run in 1.1 and all pass.


A production problem then started to show ‘Length of the data to decrypt is invalid’ and I was horribly confused as I’d inherited some of the code and all seemed good…


I thankfully found the following … http://blogs.msdn.com/shawnfa/archive/2005/11/10/491431.aspx that explained why it’s really not a good idea to assume that even though you’re only encrypting ASCII characters, you don’t use 7 or 8 bit encoding to encrypt/decrypt.  The key is in the fact that the overall sequence of bytes isn’t guaranteed to be valid Unicode or ASCII.


(Why do the unit tests pass?) – because the Cryptography classes in .NET framework were revamped for V2.0 and validation tightened up.  As Shawn says – it’s better that it doesn’t successfully decrypt into an invalid string.


This leaves a bit of a tidyup of course as I’ve now got to re-stuff all encrypted data into the database and patch the apps to ensure that the correct encryption is used.  I’ve also got to find a way to support existing files and string encrypted with the class as some code is still happily using this in 1.1-land and some is clearly ‘not’ working in 2.0-land – fun!


I guess it’s a lesson that today’s code won’t necessarily work tomorrow – and you shouldn’t discount breakages from framework changes when you’re investigating issues.