views:

336

answers:

8

I have a program that reads data from a file line-by-line. I would like to copy some substring of that line in to a map as below:

std::map< DWORD, std::string > my_map;
DWORD index;         // populated with some data
char buffer[ 1024 ]; // populated with some data
char* element_begin; // points to some location in buffer
char* element_end;   // points to some location in buffer > element_begin

my_map.insert( std::make_pair( index, std::string( element_begin, element_end ) ) );

This std::map<>::insert() operation takes a long time (It doubles the file parsing time). Is there a way to make this a less expensive operation?

Thanks, PaulH

Edit: to be more specific, I want to know that I'm doing the minimum number of copy operations to get the data from the file in to the map.

+2  A: 

Do you really need a map here? As far as I can see in your example you only want to store an index as key value that is, as I suppose, simply incremented for each insertion. You could accomplish this with an std::vectorwhich is know to be the fastest container. Just use push_backand access the value with at(index).

Simon Linder
Yes, I need the data to be sorted and indexed.
PaulH
So in a vector it is indexed automatically by the position of the element and there is also an insert-method so you can sort it.
Simon Linder
@Simon The vector is not "the fastest container". It is faster than some other containers at doing some things.
anon
+2  A: 

There's a few things you could try. There's overhead involved both in the data structure and the creation of the string itself.

  1. Does it need to be a map? You could try std::tr1::unordered_map instead and see if that helps.

  2. How fast do lookups need to be? You could try std::vector if you can live with O(n) lookup time.

  3. Do you need to store a copy of each substring? Could you just store a pointer instead?

Kristo
PaulH
@PaulH you could create a small struct that contains a char* into the buffer and a length. Store that struct in the map instead of the string itself. That would require holding the entire file in memory though.
Kristo
@PaulH, also ask yourself *when* it needs to be sorted and indexed. As long as you store enough information in the container to look up and sort the data when you need it, it doesn't necessarily need to *always* be sorted.
Kristo
A: 

you are storing strings but I gues you already have read them and them add them to the map. This will result in a copy. If you store pointer to string in it (string* instead of string) will probably be faster.

PoweRoy
The OP still would need a string object to point to. That implies a copy.
Kristo
Ofcourse. I tried to suggest to optimize the flow from reading the writing. No useless copies. So if he was reading e.g: line for line. Storing the line as string on heap. Then storing the string in the map (copy). It better to store the string on heap and store a pointer to that string in the map
PoweRoy
+2  A: 

Maybe you could try another version of the string constructor:

string ( const char * s, size_t n );

If your implementation of string does not have a specialization for char *, it will be forced to traverse the range given and copy each character individually. In that case the constructor above might be faster (just a guess though).

Space_C0wb0y
It is always nice to get a reason for a downvote.
Space_C0wb0y
+1  A: 

To answer your supplementary question slightly. Try changing the map temporarily to a vector of strings, and then time it inserting a fixed string value into the vector For example:

vector <string> v;
string s( "foobar" );

your insert loop:
   v.push_back( s );

That should give you a lower bound of what is possible regarding speed.

Also, you should time things with all optimisations turned on (if you are not already doing so). This can make a suprising difference to many Standard Library operations.

anon
Don't forget to use `reserve` with an approximate size before the `insert` loop.
Matthieu M.
A: 

If your compiler isn't able to optimize away redundant copies in the insert, you can use the bracket operator to assign directly into the map:

my_map[index].assign(element_begin, element_end)

Edit: As Neil points out this won't be helpful if there can be duplicate keys inserted.

Mark B
This creates a redundant copy of the string before the assignment takes place.
anon
I agree that it creates a redundant default-constructed string, but that should be cheaper than one populated with all the data. Does the two iterator assign not directly assign into the string data?
Mark B
@Mark The issue here is the insert() - if there is no pre-existing key entry, the entry is created using copy construction of the copy created using the iterators (the last is inescapble, the former might be optimised). If there is a pre-existing key entry, insert does nothing.
anon
A: 

Given that you need to put the data into a std::map<DWORD, std::string> then, yes, you are doing the minimum number of copy operations to get the data into the map.

Magnus Hoff
A: 

I believe most of your execution time with the map is copying strings. The std::map likes to have its own copy of everything. So when you insert, the std::map makes a copy of the key and the value.

Long time ago, when processors were slow and memory was small, programmers would use pointers to "large" data items and pass the pointer around rather than copying the data each time. A pointer is a much smaller entity than a string and requires less execution time to copy. Perhaps you should store pointers to strings in the map:

#include <map>
#include <string>
#include "boost/shared_ptr.hpp"

typedef boost::shared_ptr<string>    Shared_Str_Ptr;

typedef std::map< DWORD, Shared_Str_Ptr> Map_Container;

//...
Map_Container my_map;
Shared_Str_Ptr p_str(new std::string("Hello"));
my_map[5] = p_str;

The *shared_ptr* will take care of memory management for you so there are no worries when deleting the map or its contents.

See also Boost Smart Pointers.

Thomas Matthews
Please, not another container of smart_pointer. The overhead for creating the `std::string`s on the heap and reference counting them when only one instance is needed is definitely not what you want.
Matthieu M.