views:

46

answers:

3

lets say I've a list of 10 strings (lets just call it "str1", "str2", ... "str10" etc). I want to be able to generate all pairs from this ("str1", "str2") ("str1", "str3") . . . etc upto ("str9", "str10"). That is easy, with two loops. How to do the same thing with a million strings? Is there anyway to put it in a table, and run a query?

+4  A: 

Put them in a table, and use this join:

Select t1.StringValue, T2.StringValue
From StringsTable T1
    INNER JOIN StringsTable T2
        ON T1.StringValue <> T2.StringValue

Now, if you run a Million strings in some sort of Query Analyzer / GUI, you're setting yourself up for some hurt - that's a huge load of data returned.

Raj More
if it will ever be returned :)
devnull
+1  A: 

In C# (Java would be similar. C++ only a bit different)

 for(int i = 0; i < ArrayOfString.Length-1; ++i)
     for(int j = i+1; i < ArrayOfString.Length; ++j)
         ListOfPairs.Add(new Pair(ArrayOfString[i], ArrayOfString[j]));
James Curran
@James: The OP wants to know how to put in a table and run a query.
Raj More
@James: If you can run that code on your computer with 1M strings I'd like to buy your computer.
Albin Sunnanbo
A: 

If you want to create all those pairs you will get almost one trillion pairs.
To store them somewhere you need approximately 20 TB of data, based on 20 bytes/string-pair.

If you want to make all those pairs you should consider a generative approach that generates the pairs on the fly instead of storing them somewhere.

In c# it would look something like this:

private IEnumerable<Tuple<string, string>> GetPairs(IEnumerable<string> strings)
{
    foreach (string outer in strings)
    {
        foreach (string inner in strings)
        {
            if (outer != inner)
            {
                yield return Tuple.Create(outer, inner);
            }
        }
    }

    yield break;
}

The call

string[] strings = new string[] { "str1", "str2", "str3" };

foreach (var stringPairs in GetPairs(strings))
{
    Console.WriteLine("({0},{1})", stringPairs.Item1, stringPairs.Item2);
}

Generates the expected result (if you care about the order of the items in the pair).

(str1,str2)
(str1,str3)
(str2,str1)
(str2,str3)
(str3,str1)
(str3,str2)

Expect it to take a while with 1M strings.

Albin Sunnanbo