I want to compare one string with many strings. How is that done in C#?
If you want to check if a string is contained in a list of strings you could use the Contains
extension method:
bool isStringContainedInList =
new[] { "string1", "string2", "string3" }.Contains("some string")
string[] comparisonList = {"a", "b" "c"};
from s in comparisonList where comparisonList.Contains("b") select s;
If you want to compare, use String.Compare.
If you to find a string in a list, use the Contains/Select method equivalent of the list type.
I like to use the String.Compare() static method as it let's you make everything explicit. This is important as string comparisons can be notorious for subtle bugs.
For example:
// Populate with your strings
List<string> manyStrings = new List<string>();
string oneString="target string";
foreach(string current in manyStrings)
{
// For a culture aware, safe comparison
int compareResult=String.Compare(current,oneString,
StringComparison.CurrentCulture);
// OR
// For a higher performance comparison
int compareResult=String.Compare(current,oneString,
StringComparison.Ordinal);
if (compareResult==0)
{
// Strings are equal
}
}
If you actually want to just know if a string is a substring of another larger string, in the above loop you can use:
int indexPos=current.IndexOf(oneString,StringComparison.Ordinal);
if (indexPos>=0)
{
// oneString was found in current
}
Note that IndexOf accepts the same useful StringComparison enumeration.
To find the strings in your list, which are in the list for multiple times, you could start putting those strings into a HashSet, and check for each one, whether it is already in this set.
For example, you could:
HashSet<string> hashSet = new HashSet<string>();
foreach (string item in myList)
{
if (hashSet.Contains(item))
{
// already in the list
...
}
else
{
// not seen yet, putting it into the hash set
hashSet.Add(item);
}
}
I recommend that you look at this wikipedia article about the longest common substring problem.
I recall from undergrad that one strategy to find the longest common substring, you can start by finding a slightly shorter substring and then expand from there (and repeat). That is, if "abcd" is a common substring, then so does "abc" and so does "ab".
This lends to a repeating algorithm where you first find all the 2-letters pairs that appear in your strings (I am not bothering with one letter substrings because for large dataset they'll get include the whole alphabet). Then you iterate again to find all 3-letters substrings, and so on ...
To compare all strings in a collection to each other to find duplicates, it's most efficient to use a Dictionary:
string[] strings = { "Zaphod", "Trillian", "Zaphod", "Ford", "Arthur" };
var count = new Dictionary<string, int>();
foreach (string s in strings) {
if (count.ContainsKey(s)) {
count[s]++;
} else {
count.Add(s, 1);
}
}
foreach (var item in count) {
Console.WriteLine("{0} : {1}", item.Key, item.Value);
}
Output:
Zaphod : 2
Trillian : 1
Ford : 1
Arthur : 1
You can also do it using LINQ methods:
var count =
strings
.GroupBy(s => s)
.Select(
g => new { Key = g.First(), Value = g.Count() }
);