ansaurus

Question

Answer 1

+1 A:

Something like this should work:

string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
  return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}

George Mauer 2010-09-14 19:47:46

I suspect this is correct, and yet I don't actually know.

Steven Sudit 2010-09-14 19:51:10

I might have swapped around the parameters to the second lambda but this will definitely work, Aggregate is an incredibly powerful method, its lame people don't use it very often

George Mauer 2010-09-14 19:52:17

It should be noted that I doubt that calling Replace multiple times is not the most preformant way of doing this. Probably something where you build the contents of the list into a static RegEx and use that to replace would be faster, but I suspect the difference won't matter in this case.

George Mauer 2010-09-14 19:54:49

This is not correct because it uses `string.Replace` which can't match only on a word boundary. If you're going to use a RegEx, though, it should use a single compiled one.

Gabe 2010-09-14 20:06:30

Good point @Gabe the example is more about the usage of Aggregate than of Replace.

George Mauer 2010-09-14 20:10:46

Answer 2

A:

public static string Trim(string text)
{
   var rv = text;
   foreach (var ignore in ignoreList)
      rv = rv.Replace(ignore, "");
   return rv;
}

Updated For Gabe

public static string Trim(string text)
{
   var rv = "";
   var words = text.Split(" ");
   foreach (var word in words)
   {
      var present = false;
      foreach (var ignore in ignoreList)
         if (word == ignore)
            present = true;
      if (!present)
         rv += word;
   }
   return rv;
}

Umair Ashraf 2010-09-14 19:47:51

No LINQ, not RegExp, yet it's correct. Only thing I'd change is the use of an empty string literal.

Steven Sudit 2010-09-14 19:49:03

No, not correct. This will turn "123 Northampton" into "123 ampton".

Gabe 2010-09-14 19:50:52

Close...now you need to make sure that you put back the space between words.

Gabe 2010-09-14 22:29:16

Answer 3

+2 A:

What's wrong with a simple for loop?

string street = "14th Avenue North";
foreach (string word in ignoreList)
{
    street = street.Replace(word, string.Empty);
}

Albin Sunnanbo 2010-09-14 19:48:22

Nothing wrong with the loop, I just thought there was another way of doing it.

Hugo Migneron 2010-09-14 19:50:43

Answer 4

A:

If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.

Here's a start:

(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)

If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

Brad 2010-09-14 19:48:29

I guess we could. Do we really want to, though?

Steven Sudit 2010-09-14 19:50:32

This is a good start. Now make it so that it only matches whole words.

Gabe 2010-09-14 19:52:05

We used this approach to flag a huge list of customers as business or residential based on RegEx keywords generated from looking at the data.

Brad 2010-09-14 20:15:28

Answer 5

+6 A:

Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);

Bob 2010-09-14 19:50:14

if there are special characters, you should escape the stuff in ignoreList: string.Join("|", ignoreList.select(s => Regex.Escape(s)).ToArray())

Frank Schwieterman 2010-09-14 19:54:21

Since odds are the list will contain words like `"St."`, escaping is advised. And you have to look only for whole words.

Gabe 2010-09-14 20:04:37

@Frank Correct . . . though it isn't really specified where the list comes from. It would probably be easiest to just write the correct regular expression in the first place rather than to convert it from a list, unless the list is really necessary.

Bob 2010-09-14 20:15:19

Yeah, building a Regex dynamically is only really worthwhile if the list contents might change. Using a Regex in general is only useful if this function is used alot as its potentially faster then N string replacements.

Frank Schwieterman 2010-09-14 20:59:37

Answer 6

A:

Why not juts Keep It Simple ?

public static string Trim(string text)
{
   var rv = text.trim();
   foreach (var ignore in ignoreList) {
      if(tv.EndsWith(ignore) {
      rv = rv.Replace(ignore, string.Empty);
   }
  }
   return rv;
}

Vash 2010-09-14 19:52:35

Answer 7

+1 A:

If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:

address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));

If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.

Guffa 2010-09-14 19:53:31

Thanks a lot for that. My ignore list will obviously be much longer than what I posted here, but not sure if it will be long enough to use your method. I will profile it and see though.

Hugo Migneron 2010-09-14 19:57:04

Answer 8

+6 A:

How about this:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));

or for .Net 3:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());

Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.

Gabe 2010-09-14 19:54:01

*sip* - tastes like perl!

George Mauer 2010-09-14 19:59:43

This is a great solution, both shorter and clearer than the regex versions.

AHM 2010-09-14 20:00:49

You might as well split by the words - `text.Split(ignoreList.ToArray(), StringSplitOptions.None)`. That said, it is easier to adapt your approach to ignore case.

Kobi 2010-09-14 20:05:38

What about punctuation before or after words?

Mark Byers 2010-09-14 20:07:51

Kobi: `text.Split(ignoreList.ToArray())` doesn't work for the same reason all the `string.Replace` methods don't work.

Gabe 2010-09-14 20:09:48

Mark: Presumably he would want to consider punctuation to be word-breakers. It's up to him, but I'd guess he'd want `text.Split(new[]{' ','.',',','-'})` but he can tweak it to support whatever algorithm he has.

Gabe 2010-09-14 20:13:32

@Gabe: Then it won't match words containing punctuation, such as `St.`.

Mark Byers 2010-09-14 20:50:07

Of course, not sure how I've missed that.

Kobi 2010-09-14 21:22:23

Mark: I would expect that if he wants to ignore `St.` and he wants `.` to be a word-breaker, he would just put `St` in his ignore list.

Gabe 2010-09-14 22:25:33

Thanks a lot, this is a great solution. Very clean and readable.

Hugo Migneron 2010-09-15 00:51:25

Answer 9

+2 A:

If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:

string s = "14th Avenue North";
Regex regex = new Regex(string.Format(@"\b({0})\b",
                        string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");

Result:

14th Avenue

If there are special characters you will need to fix two things:

Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.

Here's how to fix these two problems:

Regex regex = new Regex(string.Format(@"(?<= |^)({0})(?= |$)",
    string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));

Mark Byers 2010-09-14 19:55:35

It's a pretty good bet that his words *will* need escaping, because they'll be like `"St.", "Blvd.", "Rd."`

Gabe 2010-09-14 20:03:32

That's a great way to handle the space problem raised in another comment.

Hugo Migneron 2010-09-14 20:03:59

This is very clever and it seems like it would work on all the words. I will write some tests for it and try it out properly.

Hugo Migneron 2010-09-14 20:15:54

Answer 10

A:

You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:

string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "

Øyvind Bråthen 2010-09-14 19:58:20

Answer 11

+1 A:

LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.

List<string> ignoreList = new List<string>()
{
    "North",
    "South",
    "East",
    "West"
};    

string s = "123 West 5th St"
        .Split(' ')  // Separate the words to an array
        .ToList()    // Convert array to TList<>
        .Except(ignoreList) // Remove ignored keywords
        .Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string

Phil Gilmore 2010-09-14 21:30:38

The `.ToList()` is unnecessary.

Gabe 2010-09-14 22:28:35

ansaurus

tags:

views:

answers:

string replace using a List<string>

related questions