views:

428

answers:

11

I wonder how to get something like this

1 write

copy(a,b,2,3)

2 and then get,

a[2]=b[2]; 
a[3]=b[3];
a[4]=b[4];

I know that C #defines can't be used recursively to get that effect, so I suppose that template meta-programming is in place.

I know there is a boost library for that, but I only want that "simple" trick, and boost is too "messy".

Thank you

+2  A: 

From http://www.sgi.com/tech/stl/Vector.html:

template <class InputIterator>
vector(InputIterator, InputIterator)

Creates a vector with a copy of a range. 

Does this do the job?

Anthony Labarre
I'm sorry, but no. I'm looking for something as close as possible to a search -copy(a,b,2,3)- and replace by a[2]=b[2]; a[3]=b[3];a[4]=b[4];
cibercitizen1
+2  A: 

check this http://stackoverflow.com/questions/2380143/c-metaprograming-with-templates-versus-inlining

Andrey
Linking to other questions/answers is usually done via comments not answers, especially if you don't add any information yourself
Georg Fritzsche
i will keep in mind. actually i added variant of solution to that question that can be applied here.
Andrey
Andrey: I wrote the program in the question you're forwarding meto getRepeat<2,4>::copy(a,b);but there, I was told that is not the best way to do it, so I guess there is a better way
cibercitizen1
http://stackoverflow.com/questions/2382137/how-to-unroll-a-short-loop-in-c-using-templates/2382399#2382399just do it like this :)
Andrey
@gf, @Andrey, linking to other questions/answers is best served as a "community wiki" content.
Pavel Shved
+1  A: 

Andrey's link has all the details. You should be asking yourself, "Why do I want to do this? Do I think I'm smarter than the compiler when it comes to optimization?" (The answer is no.)

mos
In my experience with loop unrolling, the answer is frequently yes. Or actually not smarter than the compiler, but better able to experiment with different possibilities and pick the best. If your compiler can use profiler results to make unrolling decisions, that helps a bit. You do have to be careful not to pick the best result for a toy benchmark program, and then use it in a real program, though, or you'll unroll too far.
Steve Jessop
@Steve: Why? I just want to learn if that is possible, and how, if so. Even if the compiler is just smarter than me (I know it is!)
cibercitizen1
Unrolling is a complex trade-off. Reducing branches and jumps normally speeds up the code, even on modern processors. Increasing the size of the code means it occupies more cache, which hurts performance in unpredictable ways. Compilers tend to be fairly conservative about unrolling, especially when the number of iterations is variable. I'd guess there are usually better performance gains for the same amount of code bloat by inlining. So especially without profiler data, the compiler misses opportunities that you can find by hand, unrolling the loops that you know run longest.
Steve Jessop
Of course whether it's worth the programmer effort is beyond the scope of the question. That's why I wouldn't necessarily call manual unrolling "smart", but it certainly can make the code run faster.
Steve Jessop
+5  A: 

C++ meta-programming is recursive. Think of your problem in terms of recursion, implement terminal case and non-terminal cases. Your terminal case can be either 0 or one. pass limits as template parameters. use structure/class because they allow partial specialization and other neat things.

template<int from, int to>
struct copy {
static void apply(source, destination) {
  source[from] = destination[from];
  copy<from+1,to>:: apply(source, destination);
}
};

// terminal case
template<int from>
struct copy<from, from> {
static void apply(source, destination) {
  source[from] = destination[from];
}
};
aaa
`source` and `destination` need to be declared; this doesn't compile as it stands.
Charles Bailey
Hey, that's the same code I gave in my question:http://stackoverflow.com/questions/2380143/c-metaprograming-with-templates-versus-inliningDoes it means, this is the only way to do it *using templates*.
cibercitizen1
+2  A: 

You can probably do something like the following. Depending on your compiler and the optimziation settings that you use you may get the effect that you are looking for.

Be aware that for small objects like char it may well be slower than a std::copy or a memcpy and that for larger objects the cost of a loop is likely to be insignificant compared to the copies going on in any case.

#include <cstddef>

template<std::size_t base, std::size_t count, class T, class U>
struct copy_helper
{
    static void copy(T dst, U src)
    {
        dst[base] = src[base];
        copy_helper<base + 1, count - 1, T, U>::copy(dst, src);
    }
};

template<std::size_t base, class T, class U>
struct copy_helper<base, 0, T, U>
{
    static void copy(T, U)
    {
    }
};

template<std::size_t base, std::size_t count, class T, class U>
void copy(T dst, U src)
{
    copy_helper<base, count, T, U>::copy(dst, src);
}

template void copy<5, 9, char*, const char*>(char*, const char*);

#include <iostream>
#include <ostream>

int main()
{
    const char test2[] = "     , World\n";
    char test[14] = "Hello";

    copy<5, 9>(test, test2);

    std::cout << test;

    return 0;
}
Charles Bailey
Hey, that's the same code I gave in my question: stackoverflow.com/questions/2380143/… Does it means, this is the only way to do it using templates? I'm starting to believe so.
cibercitizen1
@cibercitizen1: I wrote this independently, but to a well known pattern.
Charles Bailey
+14  A: 

The most straight forward solution to this is to write a loop where the start and end values are known

for(int i = 2; i <= 4; i++) {
  a[i]=b[i]; 
}

I think this is better than any sort of template/runtime-call mixture: The loop as written is completely clear to the compilers' optimizer, and there are no levels of function calls to dig through just to see what's going on.

Johannes Schaub - litb
Plus, many compilers will unroll this loop when optimization is turned on.
John Dibling
Yes I agree ! And it is trivial to write a define to be used likecopy(a,b,2,4)
cibercitizen1
+1  A: 

It's important to realize that the compiler is very smart, and that tricking it to unroll loops using template metaprogramming will probably set you back further that it gets you forward.

To get the bottom out of your optimizations: keep an eye on the disassembly. This will hopefully teach you more than throwing templates at the problem.

And note, like Johannes said: if the compiler can see that you are running a loop for a fixed number of times (or a fixed multiple of times like 4x variable), it can create code very close to optimal.

Jan
+1  A: 

It doesn't use templates and it's not a "complete" unrolling, but you can partially unroll the loop with something like this:

void copy (SomeType* a, SomeType* b, int start_index, int num_items) {
    int i = start_index;

    while (num_items > 4) {
            a[i+0] = b[i+0];
            a[i+1] = b[i+1];
            a[i+2] = b[i+2];
            a[i+3] = b[i+3];
            i += 4;
            num_items -= 4;
    }
    while (num_items > 0) {
            a[i] = b[i];
            ++i;
            --num_items;
    }
}

Now in this particular example, the extra computations involved will probably outweigh the benefits from only unrolling four elements at a time. You should get an increasing benefit from an increasing number of elements inside the top loop (throughout the function, replace 4 with however many elements you are copying inside each manually-unrolled iteration).

bta
+1  A: 

Obligatory reference to Duff's Device

jmucchiello
+1  A: 
Corwin
Hey, that's the same code I gave in my question: stackoverflow.com/questions/2380143/… Does it means, this is the only way to do it using templates? This is the 3rd code in that direction, so it seems clear.
cibercitizen1
+1  A: 

Incidentally, the boost preprocessor library has something very similar to what you want: http://www.boost.org/doc/libs/1_37_0/libs/preprocessor/doc/examples/duffs_device.c

MSN