Marius Schulz

Why Enumerable.Except() Might Not Work the Way You Might Expect

Enumerable.Except is one of the useful extension methods within the System.Linq namespace that shipped with .NET 3.5. According to the documentation, Enumerable.Except "produces the set difference of two sequences".

The static System.Linq.Enumerable class contains two overloads of the Except method:

  1. Enumerable.Except<TSource>(IEnumerable<TSource>, IEnumerable<TSource>)
  2. Enumerable.Except<TSource>(IEnumerable<TSource>, IEnumerable<TSource>, IEqualityComparer<TSource>)

Overload #1 — Using the Default Equality Comparer

The first overload uses the default equality comparer to compare values. Take a minute and think about what the following code snippet will output:

string[] fruits = { "apple", "apricot", "banana", "strawberry" };
string[] fruitsWithLongNames = { "strawberry" };

IEnumerable<string> fruitsWithShortNames = fruits.Except(fruitsWithLongNames);

Console.WriteLine("Using the default equality comparer:");
foreach (string fruit in fruitsWithShortNames)
{
    Console.WriteLine(" - {0}", fruit);
}

Most likely, the output will match your expectations:

Using the Default Equality Comparer

Overload #2 — Using a Custom Equality Comparer

Now let's look at the overload accepting an IEqualityComparer<T>. We pass in an instance of StringLengthEqualityComparer, a custom IEqualityComparer<string> that considers two strings equal if their character count equals. Again — take some time and reflect about what you expect the output to be (you should really do that):

string[] fruits = { "apple", "banana", "cherry", "strawberry" };
string[] fruitsWithLongNames = { "strawberry" };

var stringLengthComparer = new StringLengthEqualityComparer();
IEnumerable<string> fruitsWithShortNames = fruits
    .Except(fruitsWithLongNames, stringLengthComparer);

Console.WriteLine("Using our custom equality comparer:");
foreach (string fruit in fruitsWithShortNames)
{
    Console.WriteLine(" - {0}", fruit);
}

And here's the StringLengthEqualityComparer:

class StringLengthEqualityComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        return x.Length == y.Length;
    }

    public int GetHashCode(string obj)
    {
        return obj.Length;
    }
}

Since our custom StringLengthEqualityComparer compares the length of two strings, I would intuitively consider fruitsWithShortNames to contain all fruits but those with the same string length as strawberry. Because fruits solely contains one element with a matching string length of 10 characters, namely strawberry itsself, I expected the snippet above to output apple, banana and cherry. I ran the program — and learned that I was wrong:

Using a Custom Equality Comparer

Besides strawberry, the element cherry got removed as well although its string length does not equal 10 (but 6). Why is that? To answer this question, we need to have a look at how the Except extension method is implemented.

Analyzing the Implementation of Enumerable.Except

Decompiling the framework code using .NET Reflector 7 shows the following implementation:

public static IEnumerable<TSource> Except<TSource>(this IEnumerable<TSource> first,
IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    if (first == null)
    {
        throw Error.ArgumentNull("first");
    }
    if (second == null)
    {
        throw Error.ArgumentNull("second");
    }
    return ExceptIterator<TSource>(first, second, comparer);
}

Here's the private ExceptIterator<TSource> method:

private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first,
IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    Set<TSource> iteratorVariable0 = new Set<TSource>(comparer);
    foreach (TSource local in second)
    {
        iteratorVariable0.Add(local);
    }
    foreach (TSource iteratorVariable1 in first)
    {
        if (!iteratorVariable0.Add(iteratorVariable1))
        {
            continue;
        }
        yield return iteratorVariable1;
    }
}

[Update — May 6, 2014] Now that the .NET Framework is open source, we can take a look at the actual implementation of ExceptIterator.

The ExceptIterator<TSource> method makes use of the internal Set<TSource> class representing a set, a collection of distinct objects. It is comparable to the HashSet<T> class living in the System.Collections.Generic namespace. The Set<TSource>.Add<TSource> method returns true if the passed item was successfully added to the set and returns false if the item was already present; in that case, the item is not added. To determine if two items are considered equal, the Set<TSource> class uses an IEqualityComparer<TSource>. This is where our custom StringLengthEqualityComparer comes into operation.

Tracking Down the Cherry Issue

As we can see in the first 4 lines of ExceptIterator<TSource>, the items of second are added one by one to the set using the Set<TSource>.Add<TSource> method that makes sure the set only contains distinct items. After that, each item of first is added the same way.

Let's have a look at our example and find out why cherry is no part of the resulting collection:

  1. second only contains one item, strawberry, which gets added to the set.
  2. The first element of first is apple. The set does not contain any item considered equal to apple using our custom StringLengthEqualityComparer. From this it follows that apple gets added to the set and is returned by yield return.
  3. The same goes for the next element, banana. Neither strawberry nor apple equals banana; thus, banana gets added to the set and is being returned. The set now contains the elements strawberry, apple and banana, the resulting collection contains apple and banana.
  4. The next element, cherry, is neither equal to strawberry nor apple; however, it equals banana in that its string length is 6, too. Since iteratorVariable0.Add(iteratorVariable1) returns false, the condition is true and continue passes control to the next iteration of the enclosing foreach loop. yield return is not being called; hence, banana is not returned and therefore no part of the resulting collection.
  5. The last element of first, strawberry, is already present in the set and is, because of that, no part of the resulting collection. The foreach loop terminates and results in apple and banana being the only elements of the resulting collection.

Conclusion

ExceptIterator<TSource> compares each element of first to each element of second and to each previous element of first. The thing you need to keep in mind when using the Except extension method is: If first contains multiple elements considered equal, the resulting collection only contains the first of these elements.

If you do not want to remove elements of first that do not equal any element of second but any element of first, you can use the Without extension method (have a look at ExtraLINQ, a class library of mine providing additional extension methods for LINQ to Objects).

Similar Posts:

Marius Schulz

Computer science student, developer, and blogger. I love C# and JavaScript, enjoy regular expressions, and probably drink too much coffee.

Next post: ExtraLINQ: NuGet Package Available!
Previous post:

1 Comment

Matthew Iskra

Thanks for the information about the Enumerable Except. I am unable to duplicate the quirk/error, possibly because I am using Visual Studio 2013 and the .NET framework 4.5. Still, this ancient DBA loves the fact that set-style commands have crept into C#. Now if only coders would not design databases, I would be happy. (CodeFirst... <shudder>)

--Matthew

Leave a comment

Marius Schulz on Twitter Marius Schulz on GitHub RSS Feed