views:

99

answers:

3

I have been developing an iPhone app which queries a server that relays data I store in Amazon SimpleDB. I have a database table of "Submissions" by various users. I am interfacing with Facebook to retrieve Facebook Friends and wish to make a query to "Submissions" to find posts by friends - like:

SELECT * FROM submissions WHERE userID = '00123' OR userID = '00124' OR .... 

(through complete list of friends)

I think this will run into an Amazon query limit with this kind of select statement -

[Maximum number of comparisons per Select expression: 20]

Can you think of a way to elegantly pull this off with SimpleDB? I'd rather not have to do a bunch of 20 person queries. Or, do I need to move to a different database package and then do cross-table queries?

Thanks!

+2  A: 

You're needing either an IN clause or a join to a temp table. Unfortunately AmazonSimpleDB has its limitations. We abandoned it on a promising project for this very reason. We went down the path of multithreading and using the NextToken functionality before we switched gears.

You could execute parallel (multithreaded) queries to SimpleDB to get submissions, each query looking for up to 20 user IDs and then merging the results into one list. Still, it's probably time to consider a switch to MySQL or SQL Server to be able to upload a list of IDs as a temp table and then do a simple join to get the results.

Tahbaza
Thanks, that's what I feared.However, I just found that SimpleDB does have an "IN" operator, so I will need to see if items in the list there count toward the 20 comparison limit or not.
Ryan Bavetta
nope, looks like the "IN" operator is not the solution:"Uncaught exception 'Amazon_SimpleDB_Exception' with message 'Too many value tests per predicate in the query expression.'"
Ryan Bavetta
SDB is best suited for lots of small queries; is 20 person queries really that bad? You can run them all at once if you multi-thread it.
The Alchemist
+2  A: 

There is a way to do it with SimpleDB but it isn't elegant, it's more of a hack since it requires you to artificially duplicate the userid attribute in your submission items.

It's based on the fact that although you can only have 20 comparisons per IN predicate, you can have 20 IN predicates if they each name different attributes. So add additional synthetic attributes to your submission items of the form:

userID='00123' userID_2='00123' userID_3='00123' userID_4='00123' ... userID_20='00123'

They all have the identical value for a given submission. Then you can fetch the submission of up to 400 friends with a single query:

SELECT * FROM submissions 
WHERE userID IN('00120','00121',...,'00139') OR
    `userID_2` IN('00140','00141',...,'00159') OR
    `userID_3` IN('00160','00161',...,'00179') OR
    `userID_4` IN('00180','00181',...,'00199') OR
    ...
    `userID_20` IN('00300','00301',...,'00319')

You can populate the 19 extra attributes at the time the submission is created (if you have the attributes to spare) and it doesn't sound like a submission's user would ever change. Also you may want to explicitly name the attributes to be returned (instead of using * ) since you would now have 19 of them that you don't care about in the return data set.

From the data model point of view, this is clearly a hack. But having said that, it gives you exactly what you would want, for users with 400 friends or less: a single query so you can restrict by date or other criteria, sort by most recent, page through results, etc. Unfortunately, a capacity of 400 won't accommodate the friend lists of all facebook users. So you may still need to implement a multi-query solution for large friend lists just the same.

My suggestion is that if SimpleDB suits the needs of your app with the exception of this issue, then consider using the hack. But if you need to do things like this repeatedly then SimpleDB is probably not the best choice.

Mocky
Very clever trick!
The Alchemist
+1  A: 

I created the Simple Savant .NET library for SimpleDB, and I happen to have some utility code lying around for splitting and running in parallel multiple select queries, while limiting the IN clause of each select to 20 values. I'll probably roll this code into the next Savant release, but here it is for anyone who finds it useful:

    /// <summary>
    /// Invokes select queries that use parameter lists (with IN clauses) by splitting the parameter list
    /// across multiple invocations that are invoked in parallel.
    /// </summary>
    /// <typeparam name="T">The item type</typeparam>
    /// <typeparam name="P">The select parameter type</typeparam>
    /// <param name="savant">The savant instance.</param>
    /// <param name="command">The command.</param>
    /// <param name="paramValues">The param values.</param>
    /// <param name="paramName">Name of the param.</param>
    /// <returns></returns>
    public static List<T> SelectWithList<T,P>(ISimpleSavantU savant, SelectCommand<T> command, List<P> paramValues, string paramName)
    {
        var allValues = SelectAttributesWithList(savant, command, paramValues, paramName);
        var typedValues = new List<T>();
        foreach (var values in allValues)
        {
            typedValues.Add((T)PropertyValues.CreateItem(typeof (T), values));
        }
        return typedValues;
    }

    /// <summary>
    /// Invokes select queries that use parameter lists (with IN clauses) by splitting the parameter list
    /// across multiple invocations that are invoked in parallel.
    /// </summary>
    /// <typeparam name="P">The select parameter type</typeparam>
    /// <param name="savant">The savant instance.</param>
    /// <param name="command">The command.</param>
    /// <param name="paramValues">The param values.</param>
    /// <param name="paramName">Name of the param.</param>
    /// <returns></returns>
    public static List<PropertyValues> SelectAttributesWithList<P>(ISimpleSavantU savant, SelectCommand command, List<P> paramValues, string paramName)
    {
        Arg.CheckNull("savant", savant);
        Arg.CheckNull("command", command);
        Arg.CheckNull("paramValues", paramValues);
        Arg.CheckNullOrEmpty("paramName", paramName);

        var allValues = new List<PropertyValues>();
        if (paramValues.Count == 0)
        {
            return allValues;
        }

        var results = new List<IAsyncResult>();
        do
        {
            var currentParams = paramValues.Skip(results.Count * MaxValueTestsPerSimpleDbQuery).Take(MaxValueTestsPerSimpleDbQuery).ToList();
            if (!currentParams.Any())
            {
                break;
            }
            var currentCommand = Clone(command);
            currentCommand.Reset();
            var parameter = currentCommand.GetParameter(paramName);
            parameter.Values.Clear();
            parameter.Values.AddRange(currentParams.Select(e => (object)e));
            var result = savant.BeginSelectAttributes(currentCommand, null, null);
            results.Add(result);
        } while (true);

        foreach (var result in results)
        {
            var values = ((ISimpleSavant2)savant).EndSelectAttributes(result);
            allValues.AddRange(values);
        }

        return allValues;
    }

    private static SelectCommand Clone(SelectCommand command)
    {
        var newParameters = new List<CommandParameter>();
        foreach (var parameter in command.Parameters)
        {
            var newParameter = new CommandParameter(parameter.Name, parameter.PropertyName, null);
            newParameter.Values.Clear();
            newParameters.Add(newParameter);
        }
        var newCommand = new SelectCommand(command.Mapping, command.CommandText, newParameters.ToArray())
            {
                MaxResultPages = command.MaxResultPages
            };
        return newCommand;
    }
Ashley Tate