Since nobody seems to have given a satisfactory answer, I came up with one. Here's a string-based solution (.Net 4):
public static string RemoveRepeatedSpaces(this string s)
{
return s[0] + string.Join("",
s.Zip(
s.Skip(1),
(x, y) => x == y && y == ' ' ? (char?)null : y));
}
However, this is just a general case of removing repeated elements from a sequence, so here's the generalized version:
public static IEnumerable<T> RemoveRepeatedElements<T>(
this IEnumerable<T> s, T dup)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) && y.Equals(dup) ? (object)null : y)
.OfType<T>());
}
Of course, that's really just a more specific version of a function that removes all consecutive duplicates from its input stream:
public static IEnumerable<T> RemoveRepeatedElements<T>(this IEnumerable<T> s)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) ? (object)null : y)
.OfType<T>());
}
And obviously you can implement the first function in terms of the second:
public static string RemoveRepeatedSpaces(this string s)
{
return string.Join("", s.RemoveRepeatedElements(' '));
}
BTW, I benchmarked my last function against the regex version (Regex.Replace(s, " +", " ")
) and they were were within nanoseconds of each other, so the extra LINQ overhead is negligible compared to the extra regex overhead. When I generalized it to remove all consecutive duplicate characters, the equivalent regex (Regex.Replace(s, "(.)\\1+", "$1")
) was 3.5 times slower than my LINQ version (string.Join("", s.RemoveRepeatedElements())
).
I also tried the "ideal" procedural solution:
public static string RemoveRepeatedSpaces(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
char lastChar = '\0';
foreach (char c in s)
if (c != ' ' || lastChar != ' ')
sb.Append(lastChar = c);
return sb.ToString();
}
This is more than 5 times faster than a regex!