views:

213

answers:

7

I'm doing some coding at work in C++, and a lot of the things that I work on involve analyzing sets of data. Very often I need to select some elements from a STL container, and very frequently I wrote code like this:

using std::vector;
vector< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
    numbers.push_back( i );
}

vector< int > positive_numbers;
for ( vector< int >::const_iterator it = numbers.begin(), end = numbers.end();
     it != end; ++it 
) {
    if ( number > 0 ) {
     positive_numbers.push_back( *it );
    }
}

Over time this for loop and the logic contained within it gets a lot more complicated and unreadable. Code like this is less satisfying than the analogous SELECT statement in SQL, assuming that I have a table called numbers with a column named "num" rather than a std::vector< int > :

SELECT * INTO positive_numbers FROM numbers WHERE num > 0

That's a lot more readable to me, and also scales better, over time a lot of the if-statement logic that's in our codebase has become complicated, order-dependent and unmaintainable. If we could do SQL-like statements in C++ without having to go to a database I think that the state of the code might be better.

Is there a simpler way that I can implement something like a SELECT statement in C++ where I can create a new container of objects by only describing the characteristics of the objects that I want? I'm still relatively new to C++, so I'm hoping that there's something magic with either template metaprogramming or clever iterators that would solve this. Thanks!

Edit based on first two answers. Thanks, I had no idea that's what LINQ actually was. I program on Linux and OSX systems primarily, and am interested in something cross-platform across OSX, Linux and Windows. So a more educated version of this question would be - is there a cross-platform implementation of something like LINQ for C++?

+3  A: 

I think you have described LINQ (a C# and .NET 3.5 feature). Have you looked into that?

Greg Hewgill
It's a .NET feature, so it can be used from C++ I assume.
Matt Bridges
+4  A: 

You've almost exactly described LINQ. It's a .NET 3.5 feature so you should be able to use it from C++.

Matt Bridges
+4  A: 

The functionality you're describing is commonly found in functional languages that support concepts such as closures, predicates, functors, etc.

The problem with the code above is that it combines:

  1. Logic for iterating over collection (the for loop)
  2. Condition that must be satisfied for an element to be copied to another collection
  3. Logic for copying an element from one collection to another

In reality (1) and (3) are boilerplate, insofar as every time you need to iterate over a collection copying some elements to another collection, it's probably only the conditional code that will change each time. Languages with support for functional programming eliminate this boilerplate. For example, in Groovy you can replace your for loop above with just

def positive_numbers = numbers.findAll{it > 0}

Even though C++ is not a functional language there may be libraries which provide support for functional-style programming with STL collections. For example, the Apache commons collection (and also possibly Google's collection library) provides support for functional style programming with Java collections, even though Java itself is not a functional language.

Don
A: 

Check out Mono if you want try out LINQ on Linux / OS X. It's a port of the .NET Framework and LINQ is included now i believe.

Colin
+2  A: 

LINQ is the obvious answer for .NET (or Mono on non-Windows platforms, but in C++, it shouldn't be that difficult to write something like it yourself in STL.

Use the Boost.Iterator library to write a "select" iterator, for example, one which skips all elements that do not satisfy a given predicate.

Boost already has a few relevant examples in their documentation I believe. Or http://www.boost.org/doc/libs/1_39_0/libs/iterator/doc/filter_iterator.html might actually do what you need out of the box.

In any case, in C++, you could achieve the same effect basically by layering iterators.

If you have a regular iterator, which visits every element in the sequence, you can wrap that in a filter iterator, which increments the underlying iterator until it finds a value satisfying the condition. Then you could even wrap that in a "select" iterator transforming the value to the desired format.

It seems like a fairly obvious idea, but I'm not aware of any complete implementations of it.

jalf
Thanks jalf. I was looking for something magical that was already implemented for this, but I'll likely try your suggestion with iterators and filters.
James Thompson
+1  A: 

You're using STL containers. I would recommend using STL algorithms, which are largely straight out of set theory. A SQL select is translated to repeated applications of std::find_if, or a combination of std::lower_bound and std::upper_bound (on sorted containers). The performance will be about the same as looping, but the syntax is a little more declarative.

LINQ will give you similar syntax and operations, but unless used over IQueryables (i.e., data in a database) you're not going to get any performance gains either.

Your best bet after that is putting things into files for this sort of thing. Whether that's BerkelyDB, NetCDF, HDF5, STXXL, etc. File access is slow, but doing this allows you to work on more data than fits in memory.

Max Lybbert
+1  A: 

For what you're describing, std::vector isn't a terribly good choice. This is an SQL equivalent to a table with no indexes. On top of that, filling one container with the contents of another container is possibly a reasonable performance optimization, but not very readable, and not quite idiomatic, either. There are a number of ways of solving this portably (IE, without relying on managed code .net).

First choice is to choose a better container. If you don't need to have stable iteration, then you should use std::set or std::multi_set. these containers use a balanced search tree to store the values in order. This is equivalent to a simple SQL index of all columns.

std::set< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
    numbers.insert( i );
}

std::set::iterator first = numbers.find(1);
std::set::iterator end = numbers.end();

Now you can iterate from first until end without wasting any extra effort, over the O(n log(n)) fill and O(log(n) ) seek. Iterating is O(1) for std::set::iterator

If, for some reason you must use a vector, you can get more idiomatic C++ using std::find_if (see Max Lybbert's answer)

bool isPositive(int n) { return n > 0; }

std::vector< int > numbers;
for ( int i = -10; i <= 10; ++i ) {
    numbers.push_back( i );
}

for ( std::vector< int >::const_iterator end = numbers.end(), 
        iter = std::find_if(numbers.begin(), end, isPositive); // <- first positive value
      iter != end; 
      iter = std::find_if(iter, end, isPositive) // <- advance iter to the next positive
) {

    // iter is guaranteed to be positive here, do something with it!
}

If you want something even more evocative of SQL without actually connecting to a database, you should look at Boost, particularly the boost::multi_index container and boost iterators.

TokenMacGuy