Refactoring the inefficient data loop

OK. Last one for today.  I’m not going to go into too much detail except to say that the offending piece of code loads all items in a ‘matrix’ database table and builds a business object collection hierarchy.  This table has 4 fields, one or more of which may be NULL – for different meanings.  I really don’t particularly like this type of approach but changing it right now would involve quite a bit of work.


The table looks something like


Level1ID
Level2aID
Level2bID
Level2cID


This ultimately translates to a business object hierarchy of:


 



  • Rows with Level1ID (only) – all other fields null – are Level1 objects
  • Rows with Level1ID and Level2a ID (only) are Level2a objects

etc…


 


The matrix table just contains the primary keys for each object so the code loads each one in the style below…


 


Load matrix
for each row in matrix
{
 if row type == Level1
     Add to Level1Collection
 else if row type == Level2a
 {
  Load Level2a object
  Add Level2a object to Level1[Level1ID].Level2aCollection
 }
 else if row type == Level2b
 {
  Load Level2b object
  Add Level2b object to Level1[Level1ID].Level2bCollection
 }
 else if row type == Level2c
 {
  Load Level2c object
  Add Level2c object to Level1[Level1ID].Level2cCollection
 }
}


 


This seems reasonable enough (logical anyway) given the way the data’s being retrieved.


This does however load another several hundred rows from more stored proc calls that load each new object into the child collections.  This whole thing consistently takes around 8 seconds.


 


The Refactoring


 


A wise man once told me that if you can be sure the bulk of your operation is data access then lump it all together if you can, and do the loop or other grunt processing on the client. 



  1. I created a new Stored Procedure to return 4 result sets.  These come back as 4 tables in the target DataSet.  Each resultset is qualified by the same master criteria, and just uses joins to get a set of data that can be loaded directly into the collections.  The original stored proc is no longer required and this is now the only data access call.
  2. I changed my collection classes slightly to allow a LoadData method which takes in a dataset, a tablename and a parent key.  This means I can add Level2a objects to the appropriate Level1 collection.  The pseudo code now looks like…

Load multiple results
for each row in matrix
{
 if Level1 Results present 
   LoadData on Level1Collection
    
 if Level2a Results present
  for each Level1 row
   LoadData on Level1[Level1ID].Level2aCollection
   
 if Level2b Results present
  for each Level1 row
   LoadData on Level1[Level1ID].Level2bCollection


 if Level2c Results present
  for each Level1 row
   LoadData on Level1[Level1ID].LevelcbCollection
 
}


As I said at the beginning, there are some definite improvements to be made from changing the data structure, and a lot of this code could look a lot nicer by using Typed Datasets with relationships defined.


 


The new approach actually completes in less than 100ms.  I couldn’t actually believe it myself, and made sure I killed connections, cache etc to make sure the database was coming in cold.  Still the same.


 


This basically proves that for data-heavy operations, things really start to hurt when you’re making repeated client round-trips, however small the call.  This is basically a 99% saving in load time for this business object. 


 


The resulting page is also really snappy now and I’m sure the customer won’t even notice 🙂


 


Refactoring the blindingly obvious – Enums and Value Names

I know many people have written about this one, but it’s cropped up yet again in the app I’m maintaining.  The old chestnut of setting a string based on an enum value – when the enum names are identical to the string values (most often with one word names). 



public enum ItemStatus
{
    Draft 
1,
    Pending 
2,
    Released 
3,
    Recalled 
4,
    Rejected 
5,
}


switch (item.ItemStatus)
{
    
case ItemStatus.Draft:
        {
            statusLabel.Text 
“Draft”;
            break;
        
}
    
case ItemStatus.Pending:
        {
            statusLabel.Text 
“Pending”;
            break;
        
}
    
case ItemStatus.Released:
        {
            statusLabel.Text 
“Released”;
            break;
        
}
    
case ItemStatus.Recalled:
        {
            statusLabel.Text 
“Recalled”;
            break;
        
}
    
case ItemStatus.Rejected:
        {
            statusLabel.Text 
“Rejected”;
            break;
        
}
}


//or alternatively just take the ‘name’ as a string and get rid of the switch altogether
statusLabel.Text communicationItem.CommunicationItemStatus.ToString(“g”);




Colorized by: CarlosAg.CodeColorizer

There’s other ways to do this if you want more of a ‘description’ (e.g. multiple words).  The StringEnum class could do it for you, or you could provide your own implementation with a Description attribute on the values – e.g.  [Description(“My extended value name”)]

Refactoring below the surface. Things aren’t always as they seem

Continuing the refactoring from last week, I spotted another great example this morning that I ‘almost’ didn’t improve at all.


My ‘entry point’ into this refactoring was trying to eradicate dynamic SQL and also the dreaded SELECT *, so this snippet popped up in my search.
My immediate reaction was to refactor the dynamic SQL to a Stored Procedure, but then I took a step back and had a further look at the code.


The following has been changed slightly (naming) to protect the innocent. 


string listOfIDs “”;
foreach 
(Organisation organisation in dataObject.DistributionList.Organisations.Values)
{
    listOfIDs +
“‘” + organisation.OrganisationID + “‘,”;
}
string sql “”;
if 
(listOfIDs.Length > 0)
{
    listOfIDs 
listOfIDs.Substring(0, listOfIDs.Length – 1);
    
sql “SELECT * FROM tblOrganisations WHERE OrganisationID IN (” + listOfIDs + “) ORDER BY [Name]”;
}
else
{
    sql 
“SELECT * FROM tblOrganisations WHERE OrganisationID = ” ORDER BY [Name]”;
}
Database database 
Database.GetDefaultConnection();
DataSet dsOrganisations database.Execute(sql, Database.CommandExecutionType.TSQLDS);
dgrConfirmDistributionList.DataSource dsOrganisations;
dgrConfirmDistributionList.DataBind();

int 
rowCount 0;
foreach 
(Organisation organisation in dataObject.DistributionList.Organisations.Values)
{
    
// Get the datagrid row we are working on

    
DataGridItem dataGridItem dgrConfirmDistributionList.Items[rowCount];

    
// Bind Target Divisions & Select Targeted Divisions
    
Divisions divisions = new Divisions();
    
divisions.Load(dataGridItem.Cells[0].Text);
    
DataGrid dgrConfirmTargetDivisions =
        
(DataGrid) dataGridItem.Cells[2].FindControl(“dgrDivisions”);
//more…..

 



What this code is doing is actually using an ‘already populated’ collection of ‘Organisation’s and performing a nasty dynamic SQL call (using an constructed IN clause) back to the database to retrieve the same data again so it can bind against a datagrid. 


It then goes back to using the original collection further on and does some more binding through the object hierarchy by linking to child controls in the grid.


It’s clear here that we’ve got more going on than the need to just refactor the SQL code (dodgy table names etc).  It turns out that I’ve touched this code quite recently by refactoring what was a ‘pseudo-not-quite’ collection into a generic dictionary.  The original poor collection implementation was probably confusing enough to put the developer off trying to use that for data binding.  The collection now can of course be bound against the original collection with no need to go back to the database.


We end up with the following:


dgrConfirmDistributionList.DataSource dataObject.DistributionList.Organisations.Values;
dgrConfirmDistributionList.DataBind();

int 
rowCount 0;
foreach 
(Organisation organisation in dataObject.DistributionList.Organisations.Values)
{
    
// Get the datagrid row we are working on

    
DataGridItem dataGridItem dgrConfirmDistributionList.Items[rowCount];

    
// Bind Target Divisions & Select Targeted Divisions
    
Divisions divisions = new Divisions();
    
divisions.Load(dataGridItem.Cells[0].Text);
    
DataGrid dgrConfirmTargetDivisions =
        
(DataGrid) dataGridItem.Cells[2].FindControl(“dgrDivisions”);
        
//more…..    

This not only removed bad and unnecessary code that could have hurt performance, it also made me focus on ‘real’ refactoring opportunities.  It turned out that 90% of the remaining dynamic sql calls we actually duplicates of this code in other ASP.NET pages, and so that could be removed too. 


Refactoring can be quite like peeling an onion.  I’d been in this code a few days ago, and due to a small improvement made then (with the collection), I was able to come back and do some more.  The moral here is ‘patience’.  If you’re serious about improving all code you touch then you learn to make small, manageable changes that you can test before moving on to the next.  There’s still work to do on this application but the code smells are starting to fade.

Feelgood Friday – Refactoring with .NET Generics


As it’s Friday I thought – let’s do some cleaning up.  The following exercise is refactoring a class from an app I’m maintaining at the moment.  The app’s fine (functional etc), and was originally a .NET 1.1 build.  A combination of this, a previous, straight conversion to .NET 2.0 (with the minimum of mods) and a variety of historical developers (mostly junior) means that the code has a certain amount of ‘odour’ in numerous places. 


Lack of consistency isn’t the main problem, it’s more about doing things smartly rather than taking the path of least resistance with your current knowledge.  The latter generally means you don’t learn much, get much satisfaction or pride in your work, and you may end up a tad confused as to why you always end up with the ‘less glamourous’ projects.


Here’s the original code.  A simple collection class containing custom MenuItem objects. 


    /// <summary>
    /// Represents a collection of MenuItems.
    /// </summary>
    
public class MenuItems
    {
        
private readonly ArrayList palMenuItems = new ArrayList();


        
// Add a MenuItem to the Collection
        
public void Add(MenuItem objMenuItem)
        {
            palMenuItems.Add(objMenuItem)
;
        
}

        
// Retrieve a MenuItem from the Collection
        
public MenuItem Item(string strTitle)
        {
            
if (palMenuItems.Count > 0)
            {
                
foreach (MenuItem objMenuItem in palMenuItems)
                {
                    
if (objMenuItem.Title.ToLower() == strTitle.ToLower())
                    {
                        
return objMenuItem;
                    
}
                }
            }
            
return new MenuItem();
        
}

        
// Returns True if the specified Key exists in the collection
        
public bool Exists(string strTitle)
        {
            
if (palMenuItems.Count > 0)
            {
                
foreach (MenuItem objMenuItem in palMenuItems)
                {
                    
if (objMenuItem.Title.ToLower() == strTitle.ToLower())
                    {
                        
return true;
                    
}
                }
            }
            
return false;
        
}

        
// Remove a MenuItem from the Collection
        
public void Remove(string strTitle)
        {
            
if (palMenuItems.Count > 0)
            {
                
foreach (MenuItem objMenuItem in palMenuItems)
                {
                    
if (objMenuItem.Title.ToLower() == strTitle.ToLower())
                    {
                        palMenuItems.Remove(objMenuItem)
;
                        break;
                    
}
                }
            }
        }

        
// Clear the collections contents
        
public void Clear()
        {
            palMenuItems.Clear()
;
        
}

        
// Return the Number of MenuItems in the Collection
        
public int Count()
        {
            
return palMenuItems.Count;
        
}

        
// Access to the enumerator
        
public MenuItemEnumerator GetEnumerator()
        {
            
return new MenuItemEnumerator(this);
        
}

        
#region MenuItem Enumaration Class

        
//=================================================================================================
        // Inner Enumeration Class:
        //=================================================================================================
        
public class MenuItemEnumerator
        {
            
private readonly MenuItems pMenuItems;
            private int 
position 1;

            public 
MenuItemEnumerator(MenuItems objMenuItems)
            {
                pMenuItems 
objMenuItems;
            
}

            
public MenuItem Current
            {
                
get return (MenuItem) pMenuItems.palMenuItems[position]}
            }

            
// Declare the MoveNext method required by IEnumerator:
            
public bool MoveNext()
            {
                
if (position < pMenuItems.palMenuItems.Count – 1)
                {
                    position++
;
                    return true;
                
}
                
else
                
{
                    
return false;
                
}
            }

            
// Declare the Reset method required by IEnumerator:
            
public void Reset()
            {
                position 
1;
            
}

            
// Declare the Current property required by IEnumerator:
        
}

        
#endregion
    
}




Colorized by: CarlosAg.CodeColorizer


OK – Here’s the issues I see with the code.



  1. You can spot some room for improvement immediately with variable naming (a weird variation of hungarian notation) – palMenuItems, objMenuItems etc.
  2. This is a simple wrapper implementation.  In conjunction with the naming you may conclude this is an ex-VB programmer who hasn’t quite grasped object-oriented concepts yet (an observation – not a criticism).  If the developer knew about inheritance then they would have hopefully just inherited from one of the many applicable classes in System.Collections.  This would have also removed the need for implementing a custom Enumerator class.  Note that neither class actually inherits from any useful base class.  This is copy/paste pseudo-inheritance, and is rarely a wise idea.
  3. The Exists and Item methods use a simple loop mechanism to find the item they’re looking for.  This is potentially wasteful with many items, and goes back to the ‘path of least resistance’ thought.  It’s also using the inefficient ‘ToLower’ implementation to do comparison of what should be a key.  In practice the match doesn’t need to be case-insensitive, and if it did then a String.Compare should be used with the appropriate ‘ignorecase’ argument).  The developer clearly isn’t familiar (or maybe just not comfortable) with dictionaries or hashtables where access to items via a key is supported and far more efficient.
  4. The Count implementation is a method rather than a property (aargh!)
  5. It could be argued that the menuitem class itself (not shown here) is redundant as the ASP.NET framework does define a similar MenuItem class.  In reality the whole implementation could be replaced with a sitemap, a menu control, and some custom code, but I’ll leave that out of scope for now.
That gives some pointers as to what ‘could’ have been done in .NET 1.1.  .NET 2.0 of course introduced generics and so let’s follow that path.

The MenuItems class is used in a few places:



  • A menu control (that renders menu items)
  • Many pages that load items into the control (I already refactored some of this into inherited menu controls, where all pages shared the same ‘dynamic’ menu), but there’s plenty more improvements to be made.

The first thing is to say…


I don’t need a collections class…


The following is now in the Menu Control:


        //private MenuItems menuItems = new MenuItems(); – Out with the Old
        
private Dictionary<string, MenuItem> menuItems = new Dictionary<string, MenuItem>()//In with the new

        
public Dictionary<string, MenuItem> MenuItems
        {
            
get return menuItems}
            
set { menuItems = value; }
        }




Colorized by: CarlosAg.CodeColorizer

This simply uses a generic Dictionary object with a string key (as we find items by title).  Anywhere that references this dictionary will now need to add a ‘key’ in certain places (e.g. Add method), and the Item method will no longer work as that’s gone.  That needs to change to a class indexer.


Fortunately Visual Studio’s find and replace can cope with all of this as the code is largely consistent…



A quick compile should show all the patterns (from the errors) if there’s any inconsistencies.  There were probably about 200 references to patch here with about 4 different variations of naming and scope – but it took about 5 minutes to patch (including writing this text :-))


We then have to patch all the ‘Item’ references.  These are almost always setting a ‘selected’ property (Stopwatch on…)


As there’s only a few references here (about 20), I could patch them manually, but this could be done entirely by using regular expressions to do the find and replace.  http://msdn.microsoft.com/en-us/library/2k3te2cs.aspx shows all the options available.  I tend to only use regexp for large volume stuff where I get a return on my time investment to work out the expression! 


I’m just doing it in two steps without regexp as follows:



The code


            Menu1.MenuItems.Item(UserMenu.MyCommunicationItems).Selected = true;

becomes


            Menu1.MenuItems[UserMenu.MyCommunicationItems].Selected = true;


For extra points you then use Resharper to rename all of the obj and pal references to something reasonable, e,g, objMenuItem becomes menuItem.


Oh – and use Resharper’s ‘Find Usages’ on the MenuItems class to confirm it can be deleted.


And relax…..



In hindsight I would have used Resharper to ‘change signature’ on the methods in MenuItems (to patch all the references first), then converted to the generic Dictionary.  Could have been a bit quicker.


Hope this gives someone some courage to get in there and improve their code.  I firmly believe you’ve just got to ‘want’ to do it and then you’re biggest problem will be stopping yourself!



Efficient XPath Queries

This is something I get asked about quite a bit as I had a misspent youth with XSLT…


One of my pet hates is people always using the search identifier ‘//’ in XPath queries.  It’s the SELECT * of XML, and shouldn’t be used unless you actually want to ‘search’ your document.


If you’re performing SQL you’d SELECT fields explicitly rather than SELECT * wouldn’t you? 🙂


because:



  1. If the schema changes (new fields inserted) then your existing code has less chance of breaking
  2. It performs better less server pre-processing and catalog lookups
  3. More declarative and the code is easy to read and maintain

With XML (and the standard DOM-style parsers) you’re working on a document tree, and accessing nodes loaded into that tree. 


Consider the following XML fragment as an example:


Say your car dealership sells new cars and current prices are serialised in an xml document:


<xml>
 <Sales>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
</xml>


In order to get all cars you can easily use the following XPath: ‘//Car’.  This searches from the root to find all Car elements (and finds 2).


A more efficient way would be ‘/*/*/Car’ as we know Cars only exist at the 3rd level in the document


A yet more efficient way would be ‘/Sales/Cars/Car’ as this makes a direct hit on the depth and parent elements in the document.


You can also mix and match with ‘/*//Car’ to directly access the parts of the DOM you’re certain of and search on the parts you’re not.


Now lets say you go into the used car business and refactor your XML format as follows:


<xml>
 <Sales Type=”New”>
  <Cars>
   <Car Make=”Ford” Model=”Territory” />
   <Car Make=”Ford” Model=”Focus” />
  </Cars>
 </Sales>
 <Sales Type=”Used”>
  <Cars>
   <Car Make=”Honda” Model=”Civic” />
   <Car Make=”Mazda” Model=”3″ />
  </Cars>
 </Sales>
</xml>


If you want to get all Cars (new and used) you could still use any of the XPaths above.  If you want to isolate the New from the used, then you’re going to have to make some changes in any case.


‘//Car’ is obviously going to pick up 4 elements


‘/Sales[@Type=’New’]/Cars/Car’ is probably the most efficient in this case but it will vary based on the complexity of the condition (in []) and the complexity and size of the document.


It’s important to note that the effects of optimising your XPath queries won’t really be felt until you’re operating with:



  • Large documents (n mb+)
  • Deep documents (n levels deep) – n is variable based on the document size 
  • Heavy load and high demand for throughput of requests

This means don’t expect effecient XPaths to solve all your problems, but they shouldn’t be a limiting factor in a business application.  The other thing to say is that if your XPath queries are getting really complicated then your schema is probably in need of some attention as well.