ansaurus

Question

Answer 1

+2 A:

Resize vector to large enough size before you start populating it. Or, use pointers to strings instead of strings.

The thing is that the strings are being copied each time the vector is being auto-resized. For small objects such as pointers this cost nearly nothing, but for strings the whole string is copied in full.

And id and name should be string instead of char*, and be initialized like this (assuming that you still use string instead of string*):

id = string(lineBuffer, lineBuffer + l);
...
name = string(lineBuffer + s, lineBuffer + s + l);
...
ids_.push_back(id);
names_.push_back(name);

Dialecticus 2010-10-21 15:31:45

Vector was there prior to adding string. He asks why strings are slower. using pointer to strings is not that boost either, because strings use shallow copy.

Andrey 2010-10-21 15:33:04

well, not sure about is string copy deep or shallow

Andrey 2010-10-21 15:38:40

My bad. Now that OP added code I see plenty of room for improvement and reducing memory leaks.

Dialecticus 2010-10-21 15:42:40

@Dialectius and how would I get these improvements?

Elpezmuerto 2010-10-21 16:32:28

I added some code. Instead of new char[], switch and sprintf just create a string. And when you omit new char[] you will also omit memory leak that current code exhibits. And it might still be true that using pointer to string in vector instead of string would speed things up.

Dialecticus 2010-10-21 17:07:52

Answer 2

+1 A:

Before fixing something make sure that it is bottleneck. Otherwise you are wasting your time. Plus this sort of optimization is microoptimization. If you are doing microoptimization in C++ then consider using bare C.

Andrey 2010-10-21 15:31:59

and in case you decide to use bare C, http://stackoverflow.com/questions/2371292/buffered-reading-from-stdin-using-fread-in-c might help.

N 1.1 2010-10-21 15:40:55

No need to go "all C". Having a `std::vector<char *>` is a good optimization, if this is really the bottleneck.

paercebal 2010-10-21 15:58:38

@paercebal i really doubt that it can be bottleneck

Andrey 2010-10-21 16:02:27

@Andrey: My point was more about the "all C" way. If the suspensions of your shiny car is not efficient enough, no need to change the car. Just change the suspension. In C++, just find the code that is the bottleneck, wrap it with a safe interface, and hide inside the less safe but more efficient code. As an e.g., I coded a component whose one struct was a bottleneck. The struct had a std::string and an int, and it was copied a lot. I made it a class to hide inside a char[32] and an int, and the copy method used a memcpy. But for the rest of the component, it was still a full, safe C++ object.

paercebal 2010-10-21 18:49:51

Answer 3

A:

You can use a profiler to find out where your code consumes most time. If you are for example using gcc, you can compile your program with -pg. When you run it, it saves profiling results in a file. You can the run gprof on the binary to get human readable results. Once you know where most time is consumed you can post that piece of code for further questions.

Helmut 2010-10-21 15:32:58

Answer 4

+1 A:

Except for std::string, this is a C program.

Try using fstream, and use the profiler to detect the bottle neck.

VJo 2010-10-21 15:42:24

Answer 5

+5 A:

My guess it that it should be tied to the vector<string>'s performance

About the vector

A std::vector works with an internal contiguous array, meaning that once the array is full, it needs to create another, larger array, and copy the strings one by one, which means a copy-construction and a destruction of string which had the same contents, which is counter-productive...

To confirm this easily, then use a std::vector<std::string *> and see if there is a difference in performance.

If this is the case, they you can do one of those four things:

if you know (or have a good idea) of the final size of the vector, use its method reserve() to reserve enough space in the internal array, to avoid useless reallocations.
use a std::deque, which works almost like a vector
use a std::list (which doesn't give you random access to its items)
use the std::vector<char *>

About the string

Note: I'm assuming that your strings\char * are created once, and not modified (through a realloc, an append, etc.).

If the ideas above are not enough, then...

The allocation of the string object's internal buffer is similar to a malloc of a char *, so you should see little or no differences between the two.

Now, if your char * are in truth char[SOME_CONSTANT_SIZE], then you avoid the malloc (and thus, will go faster than a std::string).

Edit

After reading the updated code, I see the following problems.

if ids_ and names_ are vectors, and if you have the slightest idea of the number of lines, then you should use reserve() on ids_ and and names_
consider making ids_ and names_ deque, or lists.
faaNames_ should be a std::map, or even a std::unordered_map (or whatever hash_map you have on your compiler). Your search currently is two for loops, which is quite costly and inneficient.
Consider comparing the length of the strings before comparing its contents. In C++, the length of a string (i.e. std::string::length()) is a zero cost operation)
Now, I don't know what you're doing with the isFound variable, but if you need to find only ONE true equality, then I guess you should work on the algorithm (I don't know if there is already one, see http://www.cplusplus.com/reference/algorithm/), but I believe this search could be made a lot more efficient just by thinking on it.

Other comments:

Forget the use of int for sizes and lengths in STL. At very least, use size_t. In 64-bit, size_t will become 64-bit, while int will remain 32-bits, so your code is not 64-bit ready (in the other hand, I see few cases of incoming 8 Go strings... but still, better be correct...)

Edit 2

The two (so called C and C++) codes are different. The "C code" expects ids and names of length lesser than 5, or the program exists with an error. The "C++ code" has no such limitation. Still, this limitation is ground for massive optimization, if you confirm names and ids are always less then 5 characters.

paercebal 2010-10-21 15:45:29

That is assuming that this issue is with the old C++ standard, not the C++0x and a compiler that already implemented move semantics.

Let_Me_Be 2010-10-21 16:01:06

+1 for answering the question. And yes, the extra malloc/free are probably the source of his slow down. Toss in the memcpy required to initialize/copy the strings for good measure.

Torlack 2010-10-21 16:02:39

Let_Me_Be: There is still the cost of creating the initial string to be placed into the vector.

Torlack 2010-10-21 16:03:47

@Torlack: The initial string does not need to cost anything: In VC++ 2008 STL implementation, any string less than 16 (or 15?) chars is directly embedded in the string class, not allocated with `new`. Meaning that such string (and most "trivial" strings are within this size) cost as much as a stack char[15].

paercebal 2010-10-21 18:37:17

@Torlack: Fact is a "C initial `char *`" would be as costly to create and put in a vector than a "C++0x initial `std::string`". Still, I fail to see how a memcpy would help here, as C++ object are usually not memcpy-able (this is the province of the copy constructor, which, truly, can be less effective), but as written by Let_Me_Be, the C++0x' `std::move` is the C++ equivalent of C's `memmove`, but working on C++ objects, and used in the code, can produce miracles.

paercebal 2010-10-21 18:39:17

@Paercebal, you can assume that the name and ids are always less than 5 characters

Elpezmuerto 2010-10-21 20:17:46

I am trying to use `std::vector<std::string *>` but failing. Why doesn't declaring `std::vector<std::string*> names_;` and then `names_.push_back(line.substr(0,l))` or `names_.push_back(*line.substr(0,l))` not work?

Elpezmuerto 2010-10-21 21:47:49

@Elpezmuerto : Instead of `push_back(line(0, l))`, you should try instead `push_back(new std::string(line, 0, l))`. Note that after use, you'll have to free those strings.

paercebal 2010-10-22 08:23:13

@Elpezmuerto : Anyway, my late tests demonstrated the C++ code I did retrieve was FASTER on a gcc 4.4.5 than the C counterpart, and that reserving the vectors decreased the execution time by ~20%. But the allocation time (new/delete) of `char *` in the "C code" seemed to have little or no consequences. But I have still to work on a real test implementation (your "C++ code" was changed while I still worked on the version it replaced).

paercebal 2010-10-22 08:25:14

@Paercebal, thanks a bunch! Ya I changed to using a `map` over `vector` because a the significant performance increase in a different function: Bottleneck from comparing strings (stackoverflow.com/q/3992548/363829)

Elpezmuerto 2010-10-22 13:55:18

@Elpezmuerto: You're welcome... :-) ...

paercebal 2010-10-22 15:58:37

Answer 6

+1 A:

You can try to reserve a number of vector values in order to reduce the number of allocations (which are costly), as said Dialecticus (probably from the ancient Roma?).

But there is something that may deserve some observation: how do you store the strings from the file, do you perform concatenations etc...

In C, strings (which do not exist per say - they don't have a container from a library like the STL) need more work to deal with, but at least we know what happens clearly when dealing with them. In the STL, each convenient operation (meaning requiring less work from the programmer) may actually require a lot of operations behind the scene, within the string class, depending on how you use it.

So, while the allocations / freeings are a costly process, the rest of the logic, especially the strings process, may / should probably be looked at as well.

ring0 2010-10-21 15:50:18

Answer 7

+1 A:

I believe the main issue here is that your string version is copying things twice -- first into dynamically allocated char[] called name and id, and then into std::strings, while your vector<char *> version probably does not do that. To make the string version faster, you need to read directly into the strings and get rid of all the redundant copies

Chris Dodd 2010-10-21 17:43:45

If I want to read directly into the string, I must use `ifstream`. I don't think I can use `fget` to do that?

Elpezmuerto 2010-10-21 17:47:52

You can start with a blank string, use `.resize()` to set the size/capacity, and then write directly into the string with the overloaded `operator[]` if you want something that is as much like a `char` buffer as possible. Or you can create an output iterator that writes into a string and essentially treat the string as a bufffer you write into.

Chris Dodd 2010-10-22 17:27:08

Answer 8

+1 A:

streams take care of a lot of the heavy lifting for you. Stop doing it all yourself, and let the library help you:

void converter::updateNewList(){
    std::ifstream NewList ("NewList.txt");

    while(NewList.good()){
        int id (0);
        std::string name;

        // get the ID and name
        NewList >> id >> name;

        // ignore the rest of the line
        NewList.ignore( numeric_limits<streamsize>::max(), '\n');

        ids_.push_back (id);
        names_.push_back(name);
    }

    NewList.close();
}

There's no need to do the whitespace-tokenizing manually.

Also, you may find this site a helpful reference: http://www.cplusplus.com/reference/iostream/ifstream/

Tim 2010-10-21 21:56:51

Edit: `NewList.ignore( std::numeric_limits<std::streamsize>::max(), '\n');`

Elpezmuerto 2010-10-21 22:01:12

+1 For showing me the light, is there anyway to have ids_ and names_ as string pointers? I access them alot of that is the bottleneck is my code

Elpezmuerto 2010-10-21 22:03:45

Note that I changed your IDS to ints. It's somewhat wasteful to store numbers as strings. As for the bottleneck of accessing the strings... can you post some of the code you use to do that? If you're doing unnecessary copying, that could be the problem.

Tim 2010-10-21 22:19:45

@Tim: Bottleneck from comparing strings (http://stackoverflow.com/q/3992548/363829)

Elpezmuerto 2010-10-21 22:32:35

From that other site about the search, yes a map is definitely the way to go. That would change this loading method, but not too much. Instead of `push_back` for both vectors, load the map once with `your_map[name] = id;` The rest of the logic in this loading method could stay the same.

Tim 2010-10-22 00:09:16

ansaurus

tags:

views:

answers:

Char* vs String Speed in C++

Example NewList.Txt

OLD Code:

C++ CODE

About the vector

About the string

Edit

Edit 2

related questions