Among the Linq extensions methods that came with .NET 3.5 is one called Except. This method takes two lists (first and second)

The MSDN docs say:

This method returns those elements in first that do not appear in second. It does not also return those elements in second that do not appear in first.

This appears to be a lie. Review the code below and guess the output:


The User object has overridden the Equals and GetHashCode methods which the Except method use to determine equality. Since there is only one object in list2 that is also “equal” to an object in list1 I would expect (granted the MSDN docs are correct) that list3 would contain three users with the id zero.

The actual result? list3 will only contain ONE user object (with id zero). When I debug I see that Equals is called to compare objects in list1 with each other.

Using reflector I can see why, Except is implemented using the internal class System.Linq.Set:


It starts by adding all items from list2 into the set class then for each item from list1 that can be added to the set it yield returns. This filters all items from list1 that are equal to an item in list2 BUT it also filters all items in list1 that are equal any other item in list1!

Maybe not such a common usage scenario, and I don’t recommend overriding Equals and GetHashCode in this manner. Anyway frustrated by this because I burnt an hour on debugging before figured out it that what was to blame. 


Jeremy Gray said...

That is a perfectly legitimate equality implementation, specifically one that is commonly used for referential equality of entity types.

That said, it still is a bit odd how Except was implemented. Why they didn't just memoize the list of exceptions and do a straight foreach/yield over the source enumerable I do not know. :( Thanks for that tidbit of information!

Torkel Ödegaard said...

Maybe overriding Equals and GetHashCode like this isn't such a uncommon or bad idea, haven't seen it much that is all.

I agree very odd implementation. Will register a bug in connect, but I doubt they can fix this bug as it can break existing applications.

Anonymous said...

If you override equals like in the example, you logically do not have 3 users in the list with the same id 0, but 1 user 3 times in the list.

And that one user with id 0 is not in the second list, so the behavior is correct even if it is somewhat unexpected because of there being 3 instances of the same user in the first list.

This identity problem will keep you up at night if you use id 0 for a new unsaved entity, and you have more than one of them, and want to compare them (or put them in a set). I solved this problem by always returning false from Equals() if the id is 0 unless Object.ReferenceEquals() returns true.