xml – Codebureau – Matt Simner

Using RSS.NET to re-write an existing feed XML file

Matt Simner C#, SubVersion, Tools June 1, 2008April 20, 2012.net, c#, feed, rss, rss.net, xml

Don’t know what happened to RSS.NET. Looks like it’s trying to go commercial, but still very quiet. Code examples are few and far between (with some simple ones on the site). Also on InformIT.

I’m writing a quick RSS console app (as I lost the last one I wrote!) that I can use to write an RSS feed from a SubVersion hook. Most people use a pre-existing tool for Python (can’t remember the name), but I thought I’d give RSS.NET another go just for kicks.

The example above works fine assuming you’re maintaining state somewhere other than the feed xml file itself. The example also assumes you’re serving this up to the web. You can override the RssFeed.Write() method to take a filename.

If run the same code again (assuming you’ve written to a file) it will simply overwrite it with one item (not add to it). This isn’t what I wanted so…

You need to

Read the file back in if it’s there – otherwise create
Add your item to the existing channel if it’s there – otherwise create
Fix the date behaviour as RSS.NET always assumes UTC dates and appends ‘GMT’. The problem here is that if you’re in Australia (like me) reading and rewriting the same items will effectively add several hours on to existing items every time, because you write the date you read back in for existing items (read and parse into region-specific date, then write back as is). There’s two ways to fix this:
1. Before you add your new item – loop through all items and change the item.PubDate to item.PubDate.ToUniversalTime(). This effectively sets it back to the ‘correct’ date.
2. Change the RSSWriter class in RSS.NET to convert ToUniversalTime for the Item.PubDate, Channel.PubDate etc. This seems like a better option, but it has potentially more knock on effects in RSS.NET. I’m here to achieve a result, not change the behaviour (possibly adversely) of RSS.NET so I chose option 1

So here’s the code. Not finished yet and rough around the edges, but works as I need. The intention is to avoid the need for config files and configuring up of feeds specifically. I just want a library function that’s called by a console app. The web serving will simply be based on the location of the file and pointing to some folder in IIS.

        private static void WriteFeed(string feedFileName, string feedName, string feedDescription,
        string feedURL, string itemTitle, string itemDescription,
        DateTime itemPublishDate, string itemURL)
        {
            bool newFeed = false;
            //Try and first open the feed (to see if it’s existing)
            RssFeed feed = null;
            try
            {
                feed = RssFeed.Read(feedFileName);
            }
            catch (FileNotFoundException ex)
            {
                feed = new RssFeed();
                newFeed = true;
            }
            catch (Exception ex)
            {
                WriteError(ex);
                return;
            }

            RssChannel channel = null;

            //Loop through all channels and if we’ve got the same title reuse
            for (int i = 0; i < feed.Channels.Count; i++)
            {
                if (feed.Channels[i].Title == feedName)
                {
                    channel = feed.Channels[i];
                    break;
                }
            }

            if (channel == null)
            {
                channel = new RssChannel();
                feed.Channels.Add(channel); //might blow up if already there?
            }

            RssItem item = new RssItem();

            item.Title = itemTitle;
            item.Description = itemDescription;
            item.PubDate = itemPublishDate.ToUniversalTime();
            item.Link = new Uri(itemURL);

            //To ensure we don’t screw up existing dates – convert to UTC
            foreach (RssItem existingItem in channel.Items)
            {
                existingItem.PubDate = existingItem.PubDate.ToUniversalTime();
            }

            //Now add our new item
            channel.Items.Add(item);

            channel.Title = feedName;
            channel.Description = feedDescription;
            //channel.LastBuildDate = channel.Items.LatestPubDate();
            channel.PubDate = DateTime.UtcNow;
            channel.Link = new Uri(feedURL);

            feed.Write(feedFileName);
        }

Colorized by: CarlosAg.CodeColorizer

Efficient XPath Queries

Matt Simner XML July 31, 2007April 20, 2012performance, refactoring, xml, xpath

This is something I get asked about quite a bit as I had a misspent youth with XSLT…

One of my pet hates is people always using the search identifier ‘//’ in XPath queries. It’s the SELECT * of XML, and shouldn’t be used unless you actually want to ‘search’ your document.

If you’re performing SQL you’d SELECT fields explicitly rather than SELECT * wouldn’t you? 🙂

because:

If the schema changes (new fields inserted) then your existing code has less chance of breaking
It performs better less server pre-processing and catalog lookups
More declarative and the code is easy to read and maintain

With XML (and the standard DOM-style parsers) you’re working on a document tree, and accessing nodes loaded into that tree.

Consider the following XML fragment as an example:

Say your car dealership sells new cars and current prices are serialised in an xml document:

In order to get all cars you can easily use the following XPath: ‘//Car’. This searches from the root to find all Car elements (and finds 2).

A more efficient way would be ‘/*/*/Car’ as we know Cars only exist at the 3rd level in the document

A yet more efficient way would be ‘/Sales/Cars/Car’ as this makes a direct hit on the depth and parent elements in the document.

You can also mix and match with ‘/*//Car’ to directly access the parts of the DOM you’re certain of and search on the parts you’re not.

Now lets say you go into the used car business and refactor your XML format as follows:

If you want to get all Cars (new and used) you could still use any of the XPaths above. If you want to isolate the New from the used, then you’re going to have to make some changes in any case.

‘//Car’ is obviously going to pick up 4 elements

‘/Sales[@Type=’New’]/Cars/Car’ is probably the most efficient in this case but it will vary based on the complexity of the condition (in []) and the complexity and size of the document.

It’s important to note that the effects of optimising your XPath queries won’t really be felt until you’re operating with:

Large documents (n mb+)
Deep documents (n levels deep) – n is variable based on the document size
Heavy load and high demand for throughput of requests

This means don’t expect effecient XPaths to solve all your problems, but they shouldn’t be a limiting factor in a business application. The other thing to say is that if your XPath queries are getting really complicated then your schema is probably in need of some attention as well.