Inject and other goodness in C#

October 3rd, 2006

Two of my favourite C# 2.0 features come together in something of a perfect storm in [a set of static methods][static_methods] on the Array class. These generic methods accept delegates that can be used to search and change arrays in a nice strongly typed way. A lot of these are inspired by features in functional and dynamic languages like lisp, smalltalk, ruby and python and are discussed in that context by Martin Fowler at [http://www.martinfowler.com/bliki/CollectionClosureMethod.html][fowler].

[static_methods]: http://msdn2.microsoft.com/en-us/library/system.array_methods.aspx
[fowler]: http://www.martinfowler.com/bliki/CollectionClosureMethod.html

The methods include gems like Find, FindAll, Exists, TrueForAll, the amazingly useful ConvertAll and overloads of BinarySearch and Sort that accept a generic delegate to compare items.

Find and the related methods can be used like this:

Track[] allTracks = GetAllMyTracks();
Track[] greatMusic = Array.FindAll(allTracks, delegate(Track t) {
	return t.Rating == 5;
});
Track firstToDelete = Array.Find(allTracks, delegate(Track t) {
	return t.Rating == 1;
});
bool someTracksUnrated = Array.Exists(allTracks, delegate(Track t) {
	return t.Rating == 0;
});

ConvertAll is a real workhorse and very similar to the [map or collect][map] function in dynamic languages.

[map]: http://en.wikipedia.org/wiki/Map_(higher-order_function)

Track[] allTracks = GetAllMyTracks();
ListViewItem[] items = Array.ConvertAll<Track, ListViewItem>(allTracks, delegate(Track t) {
	return new ListViewItem(new string [] {t.Name, t.Album, t.Rating});
});
string[] albums = Array.ConvertAll<Track, string>(allTracks, delegate(Track t) {
	return t.Album;
});

These examples don’t do map/collect justice — once you start using map/collect you realise it solves a basic problem that you encounter all the time. Unfortunately the C# syntax is a little long-winded and not helped by the need to explicitly state the generic parameters to the method — in most cases the compiler can figure these out for you, but it falls down in this case. In ruby the second example would be:

albums = tracks.map{|t| t.album}

In fact the long-windedness is a bit of a problem with all of these methods, particularly when compared with their dynamic language counterparts. Luckily, the situation improves a LOT with C# 3.0, due late next year as part of Visual Studio Orcas.

C# 3.0 takes the generic delegates introduced in version 2.0 and makes them much more compact by relying on the type inference features introduced in Orcas. Writing the albums code in C# 3.0 looks like this:

albums = Array.ConvertAll(tracks, t => t.Album);

One method that seems to be missing from both the static methods on Array and from the current LINQ bits though, is the inject function. The inject function is a tricky little feller and I must admit I was completely stumped by it when I first encountered it in Ruby. Inject basically allows you to create some single accumulated value based on the entries in a list. A great example is summing numbers:

int[] numbers = new int[] { 1, 2, 3 };
int sum = Inject(numbers, 0, delegate(int total, int number)
{
	return total + number;
}));

Here, the zero we pass as the second parameter to the Inject method is the initial value of our accumulator. Inject then calls our delegate for each member of the numbers array, passing both the accumulator (total in this case) and the array member. The delegate then returns a new value for the accumulator.

In C# 3.0, we again get a much nicer syntax:

int sum = Inject(numbers, 0, (total, number) => total + number);

While summing is the classic inject example, once you get the hang of it you see how useful this simple operation is. Here’s max:

int max = Inject(numbers, int.MinValue, (current, number) => Math.Max(current, number));

If you want to count the number of times a value occurs, you can use:

int count = Inject(numbers, 0, (total, number) => number == 1 ? total + 1 : total);

Building up a hash or dictionary is also a great use of Inject, though it’s a little clunky in C#, even in version 3.0. This counts the number of times each value occurs in the array “a”.

Dictionary<int, int> frequencies = a.Inject(new Dictionary<int, int>(), (dict, i) =>
{
	dict[i] = (dict.ContainsKey(i) ? dict[i] + 1 : 1);
	return dict;
});

It’s much neater in ruby:

frequencies = a.inject({}){|hsh,i| hsh[i] ||= 0; hsh[i] += 1; hsh }

Ruby’s more concise syntax also means you can do a “group by” very easily:

# a is an array of customers, let's group them by city
by_city = a.inject({}){|hsh,c| hsh[c.city] ||= []; hsh[c.city] << c; hsh}

In C#, this would be:

Dictionary<string, List<Customer>> byCity = a.Inject(new Dictionary<string, List<Customer>>(), (dict, c) =>
{
	if (dict.ContainsKey(c.City) == false)
		dict[c.City] = new List<Customer>();
	dict[c.City].Add(c);
	return dict;
});

Inject is pretty easy to write, though in C# 2.0 you’ll have to define a delegate for it as well:

public delegate T2 Injector<T1, T2>(T2 accumulator, T1 item);

public T2 Inject<T1, T2>(IEnumerable<T1> enumerable, T2 initialValue, Injector<T1, T2> func)
{
	T2 accumulator = initialValue;
	foreach (T1 item in enumerable)
	{
		accumulator = func(accumulator, item);
	}

	return accumulator;
}

Just for good measure, here’s both inject and map in C# 3.0, this time written as extension methods on IEnumerable, which means they can be lazily evaluated:

public static T2 Inject<T1, T2>(this IEnumerable<T1> enumerable, T2 accumulator, Func<T2, T1, T2> func)
{
	foreach (T1 item in enumerable)
	{
		accumulator = func(accumulator, item);
	}

	return accumulator;
}

public static IEnumerable<T2> Map<T1, T2>(this IEnumerable<T1> enumerable, Func<T1, T2> func)
{
	foreach (T1 item in enumerable)
	{
		yield return func(item);
	}
}

Conditional Breakpoints, oh yeah

July 27th, 2006

I’ve known they existed for a while, but this week is the first time I’ve really used the new Conditional Breakpoints in Visual Studio 2005. Conditional breakpoints are a straightforward enough concept: you can set a condition on a breakpoint using the same C#-like (or I guess VB-like) syntax you use in the VS Watch panel. If the condition is true, the breakpoint is hit; otherwise it justs skips past instead.

Most of the time this isn’t hugely useful. If not every breakpoint hit is interesting, you can just F5 through those that aren’t. Right now though I’m working on a project that loops through a set of complex data, and maybe 1% of the time I want to break into the debugger. Suddenly I absolutely love conditional breakpoints.

There are a couple of (minor) drawbacks, for example the condition isn’t evaluated until the breakpoint is executed, so if you’ve used some dodgy syntax you won’t find out immediately. There’s no Intellisense either, but again, it’s not a major problem — you’re not going to be typing long sections of code here, it’s just a simple condition.

If you haven’t tried these out, give them a whirl — just right click the little red breakpoint, and choose Condition… There are a couple of other fun things in there too: filters to only break when you’re running a particular thread or process, hit counts so you can stop after a number of hits, and a nice When Hit option so you can trace when the breakpoint occurs without actually stopping.

Jim Gries has a great blog with debugger tips, though sadly it hasn’t been updated since December 2005. There’s still some great stuff there however: check it out at http://blogs.msdn.com/jimgries/default.aspx.

Visual Studio Team Suite: great for TDD (flame me :)

December 2nd, 2005

I was at the INDA meeting last night where Robert Scoble gave a talk on the rise of blogging and a panel answered questions from the floor about the future of .NET programming from Vista and beyond.

Pretty quickly the discussion got round to Visual Studio 2005, with all its little imperfections. One thing that really got people going was the recent MSDN article which attempted to describe how to use Team System to write code in a test-driven way. The article was widely ridiculed and for good reason: there was nothing remotely test-driven about the approach it described. It used tests, sure, but being test driven is about a lot more than just having tests. I didn’t like the article more than anyone else, and added my own 1/9 vote. I’m glad they took it offline, because I wouldn’t want people to be misled about the aims of test-driven development.

That said, the tool itself has actually got a lot of features that make it an efficient IDE for writing test-driven code. When it comes down to it, the core tool you need for unit testing just calls a bunch of methods and displays their results — it doesn’t get much simpler (Kent Beck even says you should write your own). On this measure, Team System performs perfectly well (how could it not?), as does NUnit.

Team System though has a couple of extra features that improve on the regular NUnit experience. For one thing, you get first rate IDE integration. Once you set the test as the startup project, Ctrl-F5 runs the tests, F5 on its own runs them in the debugger. You can do much the same with testdriven.net, or configure NUnit to start up in a similar way, but having everything a keystroke away is essential once you get hooked on the red/green/refactor rhythm.

The other killer feature is code coverage. Just by selecting an option in the build properties you get full analysis of what lines of code your tests hit, with red/green highlighting of each line and percentages for each namespace, class and method. Again, NCover does the same thing (particularly when you combine it with NCover Browser), but IDE integration brings it that little bit further.

Add all this to the base Team System functionality, with its check-in policies, work item management and proper source control (not to mention the rest), and you’ve got a great tool for using (and enforcing) TDD in teams.

Yeah, they’ve included a lot of stuff that isn’t of interest to TDD fans. If you’re test driven, you won’t be interested in creating “tests” from code you’ve already written, but then you won’t be interested in the support for manual test scripts either. Just ignore it: someone else wants those features even if you don’t. Most of all though, don’t reject the tool just because someone wrote a crappy article about it. Give it a chance, and you might be pleasantly surprised.

My new favourite Firefox extensions

November 7th, 2005

I’m normally pretty conservative with the Firefox extensions I use — the base features do 90% of what I want and most extensions seem more hassle than they’re worth. I’ve recently installed two though that are among the best I’ve tried.

IE Tab provides an icon in the status bar that you can click to change the rendering engine to IE6’s… No new windows, just an IE page inside a FF tab. What’s even better is that you can configure particular sites to always be rendered in IE. Grab it at addons.mozilla.org.

The other is Tab Preview — once it’s installed you just have to mouse over a tab to show a small preview of its contents. If you have a load of tabs open all with similar page titles, this can save loads of time. The only frustrating thing is that it doesn’t work with IE Tab — the contents of IE tabs just shows as an empty space… Tab Preview is at http://ted.mielczarek.org/code/mozilla/tabpreview/.

PDC05: Expression Quartz

September 15th, 2005

I’m really looking forward to using Quartz, one of the three Expression graphic design tools announced yesterday at the PDC. Quartz is the XHTML/CSS designer, and even the fact that Microsoft would release such a standards-based tool would have been unthinkable not long ago. I think at this stage though (with schema-checking in VS 2005 and the cross-browser Atlas project), it’s fair to say Microsoft is finally embracing web standards and cross-browser compatibility. Quartz provides a completely CSS and semantic markup based designer and uses external CSS stylesheets which are automatically updated when you change the properties of an item.

Most (all?) standards-based web developers are currently forced to use a text editor and constant alt-tabbing to a separate web browser to create their sites, and even though it’s possible to be quite productive this way, it can be pretty painful, and slow. With only a quick demo at the keynote it’s difficult to say how well this will turn out, but for more details check out the Quartz features page.

PDC05: Integrated query and C# 3.0

September 15th, 2005

The big news for C# developers at PDC has definitely been Language Integrated Query (the LINQ project) and the new features in C# 3.0, and the largest room in the convention centre was packed, with an overflow session scheduled for tomorrow for those that didn’t make it in time.

There’s far too much to describe in a blog post, but the headline features of version 3.0 include lambda expressions, extension methods (a little like mixins in Ruby), anonymous types, and local type inference. A lot of these are designed to support the new LINQ features, which make it possible to write query-like expressions in C# 3.0 or VB9 code. Queries can be run against anything that implements IEnumerable, which means you can query arrays with where clauses, group by, and order the results however you want.

The really great thing is that all of this is strongly-typed – Anders and his team have managed to take some of the most compelling features of dynamic scripting languages like Ruby and Python and deliver them in C# along with compile-time type checking and IntelliSense.

The syntax takes some getting used to:

Customer[] customers = GetCustomers();
var results =
   from c in customers
   where c.City == “London”
   select new { FullName = c.FirstName + “ “ + c.LastName, c.CompanyName }

This little snippet shows a number of things: first there’s the the var keyword, which is not a variant, but rather tells the compiler to infer the type of the results local variable. Writing “var i = 3” for example, is exactly the same as “int i = 3”, and this shorthand becomes useful later in the statement.

Most of the rest of the statement is really syntactic sugar for a number of regular method calls. In fact the statement could be written in C# 2.0 as something like:

private class CustSummary { ... }
IEnumerable results = customers
   .Where(delegate(Customer c) { return c.City == “London” })
   .Select(delegate(Customer c) {
      return new CustSummary(c.FirstName + “ “ + c.LastName, c.CompanyName)
});

If you haven’t tried out generics and anonymous delegates in C# 2.0, the code above is going to be just as impenetrable as the 3.0 version, and as Anders pointed out, all of the features planned for 3.0 make heavy use of the innovations in the 2.0 release of the CLR. In fact, C# 3.0 does not require a new version of the CLR – everything works on VS 2005 today once you install the LINQ Customer Technical Preview.

There are a couple of things about this however that are difficult to express in version 2.0. First of all, where did the CustSummary class come from? In fact, CustSummary is an anonymous type, created just to support the statement. It’s a real type, defined in your assembly, and you will have full IntelliSense support for its members, you just can’t refer to it by name since it is generated by the compiler (so it’ll actually be called type0001 or something). This is why we used the var keyword to define the results local – this way the compiler can define results correctly for us, since it knows the name of the anonymous type.

The other strange thing about the code is the Where and Select methods. Where did they come from? We seem to be calling them on the Customer array, but they aren’t part of the Array class. In fact these are extension methods, which have been added to everything in the current scope that implements IEnumerable, though they aren’t part of the IEnumerable interface. You get to use the extension methods simply by adding a using reference to System.Query, which includes a class that defines the extra methods.

Extension methods are a little scary since they allow you to add methods to any predefined class. By using a special syntax you can define these as static methods in a separate class, but make them available on the class they extend. Want to add a ToTitleCase() method to all strings? No problem. Add ToBase64() to all byte arrays? Easy. Anders demonstrated adding a ToXml method to IEnumerable, which created an XML stream for all arrays, lists and anything else that implements that interface. Powerful stuff, though as he made clear, the potential for abuse is frightening :)

If you want to find out more about these features, the best place to start is the LINQ technical preview, available on the MSDN site. All the source code for the extension methods is included as well, so you can dive straight in and start playing.

PDC05: Double-check locking fixed in 2.0

September 15th, 2005

After a nice introductory session by Jan Gray, Joe Duffy delivered a much more in depth talk on multithreading issues in .NET. Joe is a PM on the CLR team with responsibility for the System.Threading namespace, so if anyone knows the details on this, he does. He made one interesting point about the double-check locking pattern, which kind of did and kind of didn’t work in 1.1. Double-check locking is an attempt to avoid the overhead of creating a lock by checking a value before entering the lock. The value is then retested inside the lock.

if (instance == null)
{
   lock (creationLock)
   {
      if (instance == null)
         instance  = new Singleton();
   }
}
return instance;

The idea is that, once the instane is created, you never need to use the lock. In theory this is a pretty clever trick, but there’s a problem: modern processors can reorder instructions for performance reasons, which means that some of the assumptions made by the double-check pattern will not always be true. In the singleton example, if there are instructions to execute inside the constructor, instance may be set to a non-null value before the constructor completes, and therefore before the lock is exited. The result of this is that an invalid reference to instance is returned to the caller. For more details on this, read this great post by Chris Brumme.

Processors make various optimizations like these, but for the most part programmers are shielded from this complexity by a memory model defined by the language or runtime which states which optimizations can occur, regardless of what the processor wants to do.

The 1.1 version of the Common Language Infrastructure spec had a relatively weak memory model which did not prevent this specific optimization from taking place, so the double-check pattern was not guaranteed to work. In reality however, the x86 processors (the only ones supported by the offical .NET 1.1 release) don’t reorder instructions in this way, so the double-check pattern did in fact work on that platform.

The good news is that the new 2.0 version of the CLI spec includes a stricter memory model which clears up all this confusion and ensures the double-check pattern will work.

PDC05: Vista and Office 12

September 14th, 2005

Phew! The first day at the PDC was really busy, starting with a long (very long) double-keynote from Bill Gates and Jim Allchin. I guess it’s what you expect with these events, but the speakers themselves weren’t particularly engaging, but the demos really made up for it.

First up was a combined Vista and Office 12 demo – Microsoft are really keen to promote these together, hoping corporate customers will look at buying both together as part of a long overdue upgrade cycle, and they are the focus of a lot of the sessions here. Visual Studio 2005 in comparison is assumed to be the platform of choice, even though it hasn’t even been released yet.

The Vista demo was impressive, with lots of nice eye candy and a few changes since Beta 1 was released. Beta 2 is a few months off it seems, but the sidebar is back, RSS is everywhere, search is everywhere and IE has of course finally embraced tabs. There’s actually a really nice feature in IE7 that shows the scaled contents of all open tabs in a window, making it easy to navigate between them or close those you no longer need. If you have the habit as I do of steadily building up a collection of tabs over a long browsing session, this is going to be really useful.

Of course a lot of these features aren’t new, at least when you consider non-Microsoft software. Search everywhere looks exactly like Apple’s spotlight technology, as does a new app switching feature (alt-tab now shows live previews of all open windows, a lot like Exposé). Where Microsoft excels though, is improving work others have started. The film producer Julia Philips had a saying, “If you can’t be best, be first. If you can’t be first, be best,” which seems pretty appropriate.

Having heard some positive things about Office 12 on various blogs before the conference, I wasn’t sure what to expect. Office has had more features than sense for the past 8 years, and Office 97 was the last upgrade to really mean anything. The new version (due next summer I think) has loads of new features of course, but much more important is the new UI, which tries to make all that power more accessible.

Seven or eight tabs now appear where the old toolbars were, and each tab has a number of groups of three or four buttons. It’s a little hard to describe, but it really looks a lot like the old interface, except the grouping of related items really makes a big difference. By reducing the number of items the user has to choose between, it becomes easier to conceptualize and navigate around a complex interface. The hierarchy allows you to ignore anything you’re not interested in and go directly to what you need. This is a pretty common UI approach – my running copy of iTunes for example, has fourteen buttons on the main window, but you wouldn’t realise it at first. There’s one section to move between tracks, another to search, another to manage playlists, and so on. It seems a small change, but it makes a big difference in managing the complexity of features that modern software offers. Office 12 also uses larger icons for key buttons which makes them easier to distinguish and provides mini “landmarks” to use when scanning a toolbar with a number of button groups.

Overall, there’s a real feeling of the quality of these products, even though they haven’t left (or for Office even reached) an initial beta. The first public release of Office will be in a few months time, and I’ll definitely be getting a copy and having a closer look.

Getting asynchronous web services to work

July 6th, 2005

Clemens Vasters gave a great presentation at TechEd Europe last week describing how and when to use asynchronous messaging. Synchronous calls, which represent direct method calls, are easy to make and easy to understand — you get to find out easily if the call succeeded, you don’t have to deal with threading issues and, crucially, a set of related operations can follow each other in a single method.

Asynchronous programming, on the other hand, is tough. You have to send have a message off somewhere and wait until the server is ready to respond. You don’t immediately know whether a call was successful, or even whether it’s been attempted. What might be a simple set of operations when you’re programming synchronously are now controlled by a number of callbacks, which probably don’t operate on the same thread and can’t share state with local variables.

So why bother? Scalability for one thing. Synchronous calls work easily and perform just fine when you have enough resources to deal with the traffic you’re experiencing. When usage spikes, things don’t work quite so well. If you’re lucky, your server can handle it and all you experience is a slowdown. At some point however, you’ll run out of memory, the database will timeout or clients will simply give up on you and assume the call has failed.

Once you don’t guarantee you’ll respond immediately, forcing clients to make an asynchronous call, life becomes easier. You can process requests at an easier pace within the limits of what your hardware can reliably handle, grabbing the next request from a queue whenever you’re ready.

As one of his slides made very clear, Clemens loves MSMQ. Message Queuing is available on all modern versions of Windows, fast (enough) and uses a simple programming model. The most important features it introduces though are reliability and transactions. Used properly, you can submit a message to a queue happy in the knowledge that it will stay there until it’s ready to be processed. Similarly, you can read an item from the queue within a transaction: if a problem occurs, you can rollback the transaction and the item will be ready and waiting whenever the problem is resolved.

So asynchronous programming is great, message queuing is a great way to achieve it, and that’s what we should all use to build our distributed apps these days, right? Well, unfortunately it’s not that easy. While MSMQ works great within an enterprise, it’s not an option if you want an interoperable service available over the web.

Part of the solution can be to provide a SOAP endpoint and immediately push incoming requests onto a queue. This seems a pretty good approach, but what happens if the web service call fails? Benjamin Mitchell posted last year (!) about a similar talk given by Clemens at TechEd 2004, where it seems the idea was that any exceptions thrown by the WS proxy could be taken as an indicator that the call had not succeeded. By using queues at both ends (ie on the client and server sides), failed calls could simply be made later.

I didn’t see that talk so I might be missing something, but that brief description just prompts more questions. The problem is that a problem could occur after the request was added to the server-side queue. Maybe IIS runs out of memory while it’s generating the SOAP response. Maybe there’s a power failure, maybe a router dies. The client will presume the entire call has failed and retry later, which could well cause problems.

WS-ReliableMessaging is supposed to tackle some of these problems but, at least in the .NET world, isn’t yet available (it’s supposed to arrive with WSE 3.0 later this year). An idempotent web service, which is a core part of SOA and REST might do the trick as well. If duplicate requests can be dealt with at the application layer, we might be able to ignore the unreliability of the web services infrastructure, just as TCP is able to work around the unreliability of the lower layers of the network stack.

Whatever solutions are available today, this kind of messaging isn’t easy today. Indigo will make transport layer reliability easier, but that’s only part of the picture. As always, if you want to build scalable, reliable systems, you’re going to have to find ways of dealing with failure and responding in a manner that suits your business.

Great new Enterprise Services book

June 23rd, 2005

Christian Nagel’s new Enterprise Services book has just been released, and from my first impressions, it looks really good. Enterprise Services in .NET is one of those areas that a lot of developers seem happy to avoid, I think partly because there’s been a lack of a solid reference — the kind of book you can depend on to answer the big questions.

C# COM Programming is the one of the oldest books on the subject, and while it’s definitely a worthwhile read, it does tend to skip some detail in a few areas.

Coming from Addison-Wesley, you just knowthis is going to be good. AW have got to be the best .NET publisher out there, with classics like Fritz Onion’s ASP.NET book, Keith Ballinger’s Web Services and Shawn Wildermuth’s ADO.NET book. I don’t know if Christian’s book is up to that high standard yet, but it’s certainly in good company.