Coding Instinct: December 2008

Best of 2008 and some reflections on blogging

I thought I would do a best of post, as many others seems to be doing it.

Most viewed posts (in order):

This was actually my first year of blogging. I have had ambitions to start blogging for years but just never got around to actually do it, mainly because of two reasons. First I thought that if I was going to start blogging I would need to setup a custom domain, fix hosting, install and configure some blogging software, etc. All that initial setup time was a road block to start blogging, I could just not find the time to do it because of a heavy work load and a lot of overtime. The second reason was that if I started blogging I wanted it to be a serious blog with good valuable content, and for that I felt that I needed to mature and gain some more experience.

But in retrospect I regret that I did not start sooner, during 2005-2007 I was a lead developer on a multi-tenant B2B system where I did some interesting work with url rewriting with WebForms (before there was much information about it), started using NHibernate and Castle Windsor, SAP integration logging using AOP, etc, in short I had a lot to blog about that could have been valuable and helpful to the .NET community.

I am very glad that I eventually started blogging because it has been a very fun, rewarding and learning experience. One of the reasons that I eventually started was the extremely easy setup that Google's blogger service provided where you could register a domain and start blogging in a mater of minutes. Google's blogger service has been pretty great as a way to get started, the only problem is the bad comment system and some lack of flexibility. I will probably be moving to a hosted solution where I can run and configure the blogging software myself during 2009 but right now it is not a top priority.

I want thank everyone who subscribes or reads this blog, I have been very pleasantly surprised by the amount of people who subscribe, it is very motivating to see that people find value in things I write and keeps me wanting to write more and better.

Merry Christmas & Happy New Year

Object Oriented ETL using RhinoETL

If you have worked with DTS or SSIS packages you probably know that they can quickly become painful to work with, especially concerning versioning, logging and above all deployment.

I recently tried Ayende´s open source ETL framework RhinoETL, it tackles the ETL problem from a completely different angle compared to DTS / SSIS. At it's heart RhinoETL is a very simple .NET framework to handle an ETL process, the key components are processes, pipelines and operations.

The process that I needed was a very simple one, namely to update a text table stored in multiple databases. The update could be of different types, for example swap every occurrence of a text translation to another and delete or update a specific row. In previously releases this was handled by writing manual update scripts. The release that I am working on currently however requires extensive changes to the texts in these tables spread over many databases and writing repetitive SQL scripts was not something that I felt doing. It felt like a good opportunity to try RhinoETL.

I began writing this input operation:

public class ReadWordList : InputCommandOperation
{
    public ReadWordList(string connectionStringName) 
      : base(connectionStringName) {  }

    protected override Row CreateRowFromReader(IDataReader reader)
    {
        return Row.FromReader(reader);                
    }

    protected override void PrepareCommand(IDbCommand cmd)
    {
        cmd.CommandText = "SELECT * FROM Wordlists";
    }
}

This is the first operation, its responsibility is to fill the pipeline with rows from the Worldlists table. The next operation is the one updating the rows. It takes as input a list of ITextChange instances that is the object that performs the change.

public class TextChangeOperation : AbstractOperation
{
    private IList<ITextChange> changes;
  
    public TextChangeOperation(IList<ITextChange> changes)
    {
        this.changes = changes;
    }

    public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
    {
        foreach (var row in rows)
        {
            foreach (var change in changes)
            {
                if (change.IsValidFor(row))
                    change.Perform(row);
            }

            yield return row;
        }
    }
}

There is a class hierarchy representing the different types of text changes:

The code for the GeneralTextSwap class is very simple:

public class GeneralTextSwap : ITextChange
{
    public string TextOld { get; set; }
    public string TextNew { get; set; }

    public bool IsValidFor(Row row)
    {
        var text = ((string) row["Text"]).Trim();
        return text == TextOld;
    }

    public void Perform(Row row)
    {
        row["TextOld"] = row["Text"];
        row["Text"] = TextNew;
    }
}

The point of storing the old text value is that I have an operation after the TextChangeOperation that logs all changes to a csv file. The changes for a specific release is then just defined as static list on a static class. For example:

public class ReleaseChanges_For_Jan09
{
  public static IList<ITextChange> List;

  static ReleaseChanges_For_Jan09()
  {
    List = new List<ITextChange>()
    {            
      new GeneralTextSwap() 
      {
         TextOld = "some old link value",
         TextNew = "some new link value"
      },
      new UpdateTextRow()
      {
          DbName = "DN_NAME",
          WorldList = "USER_PAGE",
          Name = "CROSS_APP_LINK",
          TextNew = "New link value"
      },
      new DeleteTextRow()
      {
          DbName = "DN_NAME_2",
          WorldList = "SOME_PAGE",
          Name = "SOME_LINK"
      }
    }
  }
}

The above is just an example in reality I have hundreds of general and specific changes, and yes this is legacy system which handles text and links very strangely. The above code could easily be handled by a simple sql script with about the same number of lines of TSQL code, the benefit of placing this inside an ETL written in C# is that I can easily reuse the change logic over multiple databases and I also get great logging and tracing of all changes. The point of this post is to show how easy it can be to use OOP to model and abstract an ETL process using RhinoETL. Even though this ETL process i very simple it serves as good example for showing how a TSQL script, DTS or SSIS packages can be rewritten and simplified using the power of an object oriented language.

Crosscutting concerns & AOP

Fredrik Normén recently posted about how crosscutting concerns are implemented with boring and often duplicated code, which can be handled by using aspect oriented programming (AOP). I have used AOP with great success in previous projects to handle scenarios like logging, transactions and change tracking.

The problem with AOP is that it is not well supported or integrated into the CLR or the .NET framework and toolset. Sure there are great AOP frameworks that solve the problem via dynamic proxies, but such solutions have some big restrictions, for example they only work with virtual functions and you need to instantiate the object via a proxy creator. PostSharp handled AOP differently, it is a framework and a .NET post compiler that injects your crosscutting concerns at compile time. The problem I have found with PostSharp is that the post compile step is kind of slow, which is an issue if you do TDD (because you are constantly recompiling). Beside the compile time issue PostSharp is great tool and framework with great flexibility and power.

I was kind of disappointed that Microsoft didn't present any plan to make AOP scenarios easier and more natural on the .NET platform at this years PDC. I am not sure how, but I just feel that the AOP experience on .NET could be vastly improved :)

The problem with agile statistics

A couple of weeks ago I was invited to a presentation by Scott Ambler on "Scaling Agile Software Development: Strategies for Applying Agile in Complex Situations", it was an interesting talk. He showed a lot of diagrams and stats from a 2008 Agile Adoption rate survey, you can find the info here. The survey results are for example agile adoption rates, success rates, increase / decrease in code quality and costs. The problem with the survey is that it does not include any definition of what qualifies as agile. There are many different definitions on what makes software development agile, and many definitions like the manifesto (if you can even call that a definition) are very vague. I am not saying there is anything wrong with the manifesto or that agile needs to be better defined, what I am saying is that because agile can mean a lot of things to different people, and that many think they are doing agile while others might heartedly disagree, makes agile surveys like this very hard to interpret.

For example, the survey that covers modelling and documentation practices concludes that:

Agile teams are more likely to model than traditional teams.
Traditional teams are equally as likely to create deliverable documentation.
For all the talk in the agile community about acceptance test driven development, few teams are actually doing it in practice.
etc..

Without some minimum criteria for what qualifies as "doing agile" these surveys don't tell me that much. Even if I would disagree with the definition I think they would make surveys like this a lot more interesting.

I am not saying that the surveys are worthless, they do tell you something and they are a great resource for convincing management of agile practices so it is great that someone is taking the time to make them.

English defined in MGrammar

I am very interested in how natural languages work and evolve (see my review of the book The Language Instinct) and ever since I began playing with MGrammar I have wanted to see if it was possible to define English sentence structure using it.

Lets begin with a simple sentence:

The boy likes the girl.

This sentence is composed of the noun phrase (NP) "the boy" and a verb phrase(VP) "likes the girl". So lets begin with this MGrammar syntax:

syntax Main = S*;        
syntax S = NP VP ".";

If we look at the noun phrase "the boy", it is composed of determiner followed by noun, like wise if we look at the verb phrase "likes the girl" we see that it is composed of verb followed by a noun phrase. The MGrammar should then be:

syntax NP = Det? N;
syntax VP = V NP;

Then we just need to add some determiners, verbs and nouns:

syntax Det = "a" | "the" | "one";
syntax N = "boy" | "girl" | "dog" | "school" | "hair";
syntax V = "likes" | "bites" | "eats" | "discuss";

If you add an interleave rule to skip whitespace the sentence should be correctly parsed. That was a really simple sentence, lets add an adjective.

The nerdy boy likes the girl.

We need to modify the noun phrase rule. Before the noun an optional amount of adjectives (A*) can be placed. This is a simple change, just add A* to the noun phrase rule and add some adjectives.

syntax NP = Det? A* N;
syntax A = "happy" | "lucky" | "tall" | "red" | "nerdy";

That was simple, lets add something more to the sentence, for example:

The nerdy boy likes the girl from school with red hair .

I added a nested prepositional phrase (PP). A prepositional phrase is, according to wikipedia, composed of a preposition(P) and a noun phrase.

syntax NP = Det? A* N PP?;
syntax PP = P NP;
syntax P = "on" | "in" | "from" | "with";

The recursive nature of the PP phrase makes it possible to nest infinite number of prepositional phrases inside each other. Here is an illustration of the syntax tree for "girl from school with red hair":

I think I will stop here because this post is turning in to an English grammar lesson and I don't won't to loose all my subscribers :) Defining the English sentence structure in MGrammar is pretty pointless, unless you are building a grammar checker, in which case you are still out of luck as it will probably be impossible to define grammar for how words are built and you will run into trouble with ambiguity (which most natural languages have). But it was a fun try, and it is a good example for showing how recursive rules are parsed.

If you missed Martin Fowlers post on Oslo, it is a good read, I like how he defines it as a Language Workbench.

PS. I have started twittering, I know am late to the game, I just didn't get the point of twitter. I have been using it for a two days now and I am beginning to see the light. Oh, and please skip pointing out the irony with the inevitable grammatical errors in this post :)

Coding Instinct

Categories

Archive