tags:

views:

683

answers:

4

I've stored 30,000 SimpleObjects in my database:

class SimpleObject
{
    public int Id { get; set; }
}

I want to run a query on DB4O that finds all SimpleObjects with any of the specified IDs:

public IEnumerable<SimpleObject> GetMatches(int[] matchingIds)
{
     // OH NOOOOOOES! This activates all 30,000 SimpleObjects. TOO SLOW!
     var query = from SimpleObject simple in db
                 join id in matchingIds on simple.Id equals id
                 select simple;
     return query.ToArray();
}

How do I write this query so that DB4O doesn't activate all 30,000 objects?

+1  A: 

I haven't done much with db4o LINQ. But you can use the DiagnosticToConsole (or ToTrace) and add it to the IConfiguration.Diagnostic().AddListener. This will show you if the query is optimized.

You don't give a lot of details, but is the Id property on SimpleObject indexed?

Once diagnostics are turned on, you might try the query like so...

from SimpleObject simple in db
where matchingIds.Contains(simple.Id)
select simple

See if that gives you a different query plan.

Justin Rudd
Id property is indexed, yeah. I'll try turning on diagnostics. Thanks for the help.
Judah Himango
+2  A: 

I am not an expert on this, and it might be good to post on the DB4O forums about it, but I think I have a solution. It involves not using LINQ and using SODA.

This is what I did. I created a quick project that populates the database with 30000 SimpleObject based on your post's definition. I then wrote a query to grab all the SimpleObjects from the database:

var simpleObjects = db.Query<SimpleObject>(typeof(SimpleObject));

When I wrapped a StopWatch around it, that run takes about 740 milliseconds. I then used your code to search for a 100 random numbers between 0 and 2999. The response was 772 ms, so based on that number I am assuming that it is pulling all the objects out of the database. I am not sure how to verify that, but later I think I proved it with performance.

I then went lower. From my understanding the LINQ provider from the DB4O team is just doing a translation into SODA. Therefore I figured that I would write a SODA query to test, and what I found was that using SODA against a property is bad for performance because it took 19902 ms to execute. Here is the code:

private SimpleObject[] GetSimpleObjectUsingSodaAgainstAProperty(int[] matchingIds, IObjectContainer db)
{
    SimpleObject[] returnValue = new SimpleObject[matchingIds.Length];

    for (int counter = 0; counter < matchingIds.Length; counter++)
    {
        var query = db.Query();
        query.Constrain(typeof(SimpleObject));
        query.Descend("Id").Constrain(matchingIds[counter]);
        IObjectSet queryResult = query.Execute();
        if (queryResult.Count == 1)
            returnValue[counter] = (SimpleObject)queryResult[0];
    }

    return returnValue;
}

So thinking about why this would be so bad, I decided to not use an auto-implemented property and define it my self because Properties are actually methods and not values:

public class SimpleObject
{
    private int _id;

    public int Id { 
        get
        { return _id; }
        set
        { _id = value; }
    }
}

I then rewrote the query to use the _id private field instead of the property. The performance was much better at about 91 ms. Here is that code:

private SimpleObject[] GetSimpleObjectUsingSodaAgainstAField(int[] matchingIds, IObjectContainer db)
{
    SimpleObject[] returnValue = new SimpleObject[matchingIds.Length];

    for (int counter = 0; counter < matchingIds.Length; counter++)
    {
        var query = db.Query();
        query.Constrain(typeof(SimpleObject));
        query.Descend("_id").Constrain(matchingIds[counter]);
        IObjectSet queryResult = query.Execute();
        if (queryResult.Count == 1)
            returnValue[counter] = (SimpleObject)queryResult[0];
    }

    return returnValue;
}

Just to make sure that it is was not a fluke, I ran the test run several times and recieved similar results. I then added another 60,000 records for a total of 90,000, and this was the performance differences:

GetAll: 2450 ms
GetWithOriginalCode: 2694 ms
GetWithSODAandProperty: 75373 ms
GetWithSODAandField: 77 ms

Hope that helps. I know that it does not really explain why, but this might help with the how. Also the code for the SODA field query would not be hard to wrap to be more generic.

Mark Hall
+2  A: 

Hi

If you try to run this query using LINQ it'll run unoptimized (that means that db4o are going to retrieve all objects of type SimpleObject and delegate the rest to LINQ to objects)

The best approach would be to run n queries (since the id field is indexed, each query should run fast) and aggregate the results as suggested by "Mark Hall".

You can even use LINQ for this (something like)

IList<SimpleObject> objs = new List<SimpleObject>();
foreach(var tbf in ids)
{
     var result = from SimpleObject o in db()
               where o.Id = tbf
                  select o;

     if (result.Count == 1)
     {
        objs.Add(result[0]);
     }
}

Best

Vagaus
+1  A: 

You could 'build' a dynamic linq query. For example the API could look like this:

First parameter: a expression which tells on which property you search The other parameters: the id's or whatever you're searching for.

 var result1 = db.ObjectByID((SimpleObject t) => t.Id, 42, 77);
 var result2 = db.ObjectByID((SimpleObject t) => t.Id, myIDList);
 var result3 = db.ObjectByID((OtherObject t) => t.Name, "gamlerhart","db4o");

The implementation builds a dynamic query like this:

var result = from SimpleObject t 
  where t.Id = 42 || t.Id==77 ... t.Id == N
  select t

Since everything is combined with OR the can be evaluated directly on the indexes. It doesn't need activation. Example-Implementations:

public static class ContainerExtensions{

public static IDb4oLinqQuery<TObjectType> ObjectByID<TObjectType, TIdType>(this IObjectContainer db,
Expression<Func<TObjectType, TIdType>> idPath,
params TIdType[] ids)
{
  if(0==ids.Length)
  {
       return db.Cast<TObjectType>().Where(o=>false);
  }
  var orCondition = BuildOrChain(ids, idPath);
  var whereClause = Expression.Lambda(orCondition, idPath.Parameters.ToArray());
  return db.Cast<TObjectType>().Where((Expression<Func<TObjectType, bool>>) whereClause);
}

private static BinaryExpression BuildOrChain<TIdType, TObjectType>(TIdType[] ids,     Expression<Func<TObjectType, TIdType>> idPath)
{
  var body = idPath.Body;
  var currentExpression = Expression.Equal(body, Expression.Constant(ids.First()));
  foreach (var id in ids.Skip(1))
  {
    currentExpression = Expression.OrElse(currentExpression, Expression.Equal(body,     Expression.Constant(id)));
  }
return currentExpression;
    }

}
Gamlor