tags:

views:

59

answers:

4

I have an array of input strings that contains either email addresses or account names in the form of domain\account. I would like to build a List of string that contains only email addresses. If an element in the input array is of the form domain\account, I will perform a lookup in the dictionary. If the key is found in the dictionary, that value is the email address. If not found, that won't get added to the result list. The code below will makes the above description clear:

private bool where(string input, Dictionary<string, string> dict)
{
    if (input.Contains("@"))
    {                
        return true;
    }
    else
    {
       try
       {
           string value = dict[input];             
           return true;
       }
       catch (KeyNotFoundException)
       {
           return false;
       }
    }
}

private string select(string input, Dictionary<string, string> dict)
{
    if (input.Contains("@"))
    {                
        return input;
    }
    else
    {
        try
        {
            string value = dict[input];                    
            return value;
        }
        catch (KeyNotFoundException)
        {
            return null;
        }
    }
}
public void run()
{
    Dictionary<string, string> dict = new Dictionary<string, string>()
    {
        { "gmail\\nameless", "[email protected]"}
    };    

    string[] s = { "[email protected]", "gmail\\nameless", "gmail\\unknown" };
    var q = s.Where(p => where(p, dict)).Select(p => select(p, dict));
    List<string> resultList = q.ToList<string>();
}

While the above code works (hope I don't have any typo here), there are 2 problems that I do not like with the above:

  1. The code in where() and select() seems to be redundant/repeating.
  2. It takes 2 passes. The second pass converts from the query expression to List.

So I would like to add to the List resultList directly in the where() method. It seems like I should be able to do so. Here's the code:

private bool where(string input, Dictionary<string, string> dict, List<string> resultList)
{
    if (input.Contains("@"))
    {                
        resultList.Add(input);  //note the difference from above
        return true;
    }
    else
    {
       try
       {
           string value = dict[input];
           resultList.Add(value); //note the difference from above             
           return true;
       }
       catch (KeyNotFoundException)
       {
           return false;
       }
    }
}

The my LINQ expression can be nicely in 1 single statement:

List<string> resultList = new List<string>();
s.Where(p => where(p, dict, resultList));

Or

var q = s.Where(p => where(p, dict, resultList)); //do nothing with q afterward

Which seems like perfect and legal C# LINQ. The result: sometime it works and sometime it doesn't. So why doesn't my code work reliably and how can I make it do so?

A: 

You don't generally want to have side-effects on an unrelated object like your list. It makes it difficult to understand, debug, and refactor. I wouldn't worry about optimizing the query until you know it's not performing well.

So, what's wrong with your original expression? You don't need both the select and the where. You only need the Where() call. This will return a list of email addresses, which you can stick into a HashSet. The HashSet will provide the uniqueness you seem to desire. This will add execution time, so if you don't need it, don't use it.

You should only really need something like:

var s = new[] {"[email protected]", "me_not_at_me.com", "not_me"};
var emailAddrs = s.Where( a => a.Contains("@")); // This is a bad email address validator; find a better one.
var uniqueAddrs = new HashSet<string>(emailAddrs);

(Note, I've not dealt with HashSet, so the constructor might not take an Enumerable. This would be an exercise for the reader.)

CWF
Agree with the email address validator and well taken, but that wasn't the essence of the question and the above is just an example.
Khnle
Good point about the side-effects.
Khnle
+1  A: 

It sounds like what you want is an iterator. By making your own iterator you can filter the list and produce output at the same time.

public static IEnumerable EmailAddresses(IEnumerable<string> inputList,
    Dictionary<string, string> dict)
{
    foreach (string input in inputList)
    {
        string dictValue;
        if (input.Contains("@"))
            yield return input;
        else if (TryGetValue(input, out dictValue)
            yield return dictValue;
        // else do nothing
    }
}

List<string> resultList = EmailAddresses(s, dict).ToList();
Gabe
+1 Iterators rock.
Bryan Watts
A: 

Here is one way you could approach it with LINQ. It groups the values by whether or not they are email addresses, resulting in 2 groups of strings. If a group is the email address group, we select directly from it, otherwise we look up the emails and select from those:

public static IEnumerable<string> SelectEmails(
    this IEnumerable<string> values,
    IDictionary<string, string> accountEmails)
{
    return
        from value in values
        group value by value.Contains("@") into valueGroup
        from email in (valueGroup.Key ? valueGroup : GetEmails(valueGroup, accountEmails))
        select email;
}

private static IEnumerable<string> GetEmails(
    IEnumerable<string> accounts,
    IDictionary<string, string> accountEmails)
{
    return
        from account in accounts
        where accountEmails.ContainsKey(account)
        select accountEmails[account];
}

You would use it like this:

var values = new string[] { ... };
var accountEmails = new Dictionary<string, string> { ... };

var emails = values.SelectEmails(accountEmails).ToList();

Of course, the most straightforward way to implement this extension method would be @gabe's approach.

Bryan Watts
+2  A: 

If you reverse the where and the select you can convert unknown domain accounts to null first, then just filter them out.

private string select(string input, Dictionary<string, string> dict)
{
    if (input.Contains("@"))
    {                
        return input;
    }
    else
    {
        if (dict.ContainsKey(input))
            return dict[input];
    }
    return null;
}

var resultList = s
    .Select(p => select(p, dict))
    .Where(p => p != null)
    .ToList()

This takes care of your duplicate code.

It takes 2 passes. The second pass converts from the query expression to List.

Actually this is only one pass as LINQ is lazy evaluated. This is why your last statements only work sometimes. The filter is only applied and your list generated if the LINQ query is evaluated. Otherwise the Where statement is never run.

Cameron MacFarland
Happy with your answer which solves both of my concerns. I thought I was crazy when I saw it works only part of the time. The LINQ lazy evaluation explains it.
Khnle