lets say I've a list of 10 strings (lets just call it "str1", "str2", ... "str10" etc). I want to be able to generate all pairs from this ("str1", "str2") ("str1", "str3") . . . etc upto ("str9", "str10"). That is easy, with two loops. How to do the same thing with a million strings? Is there anyway to put it in a table, and run a query?
views:
46answers:
3
+4
A:
Put them in a table, and use this join:
Select t1.StringValue, T2.StringValue
From StringsTable T1
INNER JOIN StringsTable T2
ON T1.StringValue <> T2.StringValue
Now, if you run a Million strings in some sort of Query Analyzer / GUI, you're setting yourself up for some hurt - that's a huge load of data returned.
Raj More
2010-08-30 14:32:23
if it will ever be returned :)
devnull
2010-08-30 14:41:29
+1
A:
In C# (Java would be similar. C++ only a bit different)
for(int i = 0; i < ArrayOfString.Length-1; ++i)
for(int j = i+1; i < ArrayOfString.Length; ++j)
ListOfPairs.Add(new Pair(ArrayOfString[i], ArrayOfString[j]));
James Curran
2010-08-30 14:32:36
@James: If you can run that code on your computer with 1M strings I'd like to buy your computer.
Albin Sunnanbo
2010-08-30 17:35:28
A:
If you want to create all those pairs you will get almost one trillion pairs.
To store them somewhere you need approximately 20 TB of data, based on 20 bytes/string-pair.
If you want to make all those pairs you should consider a generative approach that generates the pairs on the fly instead of storing them somewhere.
In c# it would look something like this:
private IEnumerable<Tuple<string, string>> GetPairs(IEnumerable<string> strings)
{
foreach (string outer in strings)
{
foreach (string inner in strings)
{
if (outer != inner)
{
yield return Tuple.Create(outer, inner);
}
}
}
yield break;
}
The call
string[] strings = new string[] { "str1", "str2", "str3" };
foreach (var stringPairs in GetPairs(strings))
{
Console.WriteLine("({0},{1})", stringPairs.Item1, stringPairs.Item2);
}
Generates the expected result (if you care about the order of the items in the pair).
(str1,str2)
(str1,str3)
(str2,str1)
(str2,str3)
(str3,str1)
(str3,str2)
Expect it to take a while with 1M strings.
Albin Sunnanbo
2010-08-30 17:33:06