views:

310

answers:

4

Update 2:

Well I’ve refactored the work-around that I have into a separate function. This way, while it’s still not ideal (especially since I have to free outside the function the memory that is allocated inside the function), it does afford the ability to use it a little more generally. I’m still hoping for a more optimal and elegant solution…


Update:

Okay, so the reason for the problem has been established, but I’m still at a loss for a solution.

I am trying to figure out an (easy/effective) way to modify a few bytes of an array in a struct. My current work-around of dynamically allocating a buffer of equal size, copying the array, making the changes to the buffer, using the buffer in place of the array, then releasing the buffer seems excessive and less-than optimal. If I have to do it this way, I may as well just put two arrays in the struct and initialize them both to the same data, making the changes in the second. My goal is to reduce both the memory footprint (store just the differences between the original and modified arrays), and the amount of manual work (automatically patch the array).


Original post:

I wrote a program last night that worked just fine but when I refactored it today to make it more extensible, I ended up with a problem.

The original version had a hard-coded array of bytes. After some processing, some bytes were written into the array and then some more processing was done.

To avoid hard-coding the pattern, I put the array in a structure so that I could add some related data and create an array of them. However now, I cannot write to the array in the structure. Here’s a pseudo-code example:

main() {
  char pattern[]="\x32\x33\x12\x13\xba\xbb";
  PrintData(pattern);
  pattern[2]='\x65';
  PrintData(pattern);
}

That one works but this one does not:

struct ENTRY {
  char* pattern;
  int   somenum;
};

main() {
  ENTRY Entries[] = {
      {"\x32\x33\x12\x13\xba\xbb\x9a\xbc", 44}
    , {"\x12\x34\x56\x78", 555}
  };
  PrintData(Entries[0].pattern);
  Entries[0].pattern[2]='\x65';   //0xC0000005 exception!!! :(
  PrintData(Entries[0].pattern);
}

The second version causes an access violation exception on the assignment. I’m sure it’s because the second version allocates memory differently, but I’m starting to get a headache trying to figure out what’s what or how to get fix this. (I’m currently working around it by dynamically allocating a buffer of the same size as the pattern array, copying the pattern to the new buffer, making the changes to the buffer, using the buffer in the place of the pattern array, and then trying to remember to free the—temporary—buffer.)

(Specifically, the original version cast the pattern array—+offset—to a DWORD* and assigned a DWORD constant to it to overwrite the four target bytes. The new version cannot do that since the length of the source is unknown—may not be four bytes—so it uses memcpy instead. I’ve checked and re-checked and have made sure that the pointers to memcpy are correct, but I still get an access violation. I use memcpy instead of str(n)cpy because I am using plain chars (as an array of bytes), not Unicode chars and ignoring the null-terminator. Using an assignment as above causes the same problem.)

Any ideas?

+6  A: 

It is illegal to attempt to modify string literals. Your

Entries[0].pattern[2]='\x65'; 

line attempts exactly that. In your second example you are not allocating any memory for the strings. Instead, you are making your pointers (in the struct objects) to point directly at string literals. And string literals are not modifiable.

This question gets asked several times every day. Read http://stackoverflow.com/questions/1614723/why-is-this-c-code-causing-a-segmentation-fault/1614739#1614739 for more details.

AndreyT
Argh, actually I had thought of that too because it occured to me that the literal is of type *const char**. But I forgot about it because the first version worked. (The literal is stored somewhere in memory—ie is allocated—and the pointer points to that memory, so it seemed logical to use that pointer to write to those bytes.) I guess the difference in where and how the memory is allocated determines whether or not writing to that memory is legal.
Synetech inc.
@Synetech inc.: String literals are *not* of type `const char*`. String literals in C++ have type `const char[N]`, where `N` is string length + 1.
AndreyT
Sorry, I meant const char[]. I have seen compiler errors nagging about being unable to convert some string literal parameter because it is const. :|
Synetech inc.
+2  A: 

The problem boils down to the fact that a char[] is not a char*, even if the char[] acts a lot like a char* in expressions.

Michael Burr
Hmm, array names are supposed to be pointers to the first element. Is an array of chars special?
Synetech inc.
@Synetech inc.: Arrays names *act as* (*pretend to be*) pointers in so called *value contexts*. That does not mean that arrays ("array names") *are* pointers or in some way equivalent to them. Arrays are arrays, completely different from pointers.
AndreyT
I had already tried to use a char[] in the struct instead of a char*, but it doesn’t seem to be possible even if the struct is initialized.
Synetech inc.
@Synetech: You cannot use a variable sized array inside an array. How would the compiler know how big each array element was if each one could be a different length?
Zan Lynx
@Synetech: Except in certain circumstances (when you're using C99 'variable length arrays', which in my experience is not common, particularly on Windows), an array in a structure needs to have a definite size. In your first example, that size is provided implicitly by the initialization string. If you're moving the char[] into a struct, you'll need to specify its size (such as `char pattern[7]` which will make it as large as in your 1st example).
Michael Burr
@Zan Lynx, the size of the array *is* definite and constant when you initialize the struct, it just varies across instances which shouldn’t be a problem.
Synetech inc.
@Michael Burr, right, but then all objects of the struct type would have to have the same size for the array, which would mean setting it to the largest size.
Synetech inc.
@Synetech: that's right - you'd need to size the struct to hold the largest string you'd support. If that's not flexible enough, then you'll need to use a pointer (like in your second example) but have the pointer refer to a dynamically allocated array. A somewhat more advanced technique is to have the array at the end of the struct, and dynamically allocate the struct with a custom size for the array needed for each particular instance: http://www.drdobbs.com/cpp/184403480
Michael Burr
But why? The size of each array is defined and constant at compile time if I initialize the struct(s).
Synetech inc.
@Synetech: I'm not sure what you're asking in your last question. If you put the array inside the struct, its size is defined and there with each struct instance (even if the size is implicitly defined by initializing it). The contents of the array can be changed, but you can't store more characters in it than how the array was sized at compile time. If you put a `char*` into the struct instead, then there's no storage allocated in the struct for the char data - only a pointer. In that case you have to manage the buffer that the pointer points to. More flexibility, but also more complexity.
Michael Burr
What I mean is why can you use `char[] foo="bar";` but you cannot use `struct item {char[] foo}; item baz={"bar"}; item test={"something"};` In both cases, the size of the array is known at compile time because it is being initialized. Sure, `item uninit;` is not initialized, so it’s fine if the compiler complains about that, but it shouldn’t if it *is* initialized.
Synetech inc.
A: 

Based on your updates, your real problem is this: You want to know how to initialize the strings in your array of structs in such a way that they're editable. (The problem has nothing to do with what happens after the array of structs is created -- as you show with your example code, editing the strings is easy enough if they're initialized correctly.)

The following code sample shows how to do this:

// Allocate the memory for the strings, on the stack so they'll be editable, and
// initialize them:
char ptn1[] = "\x32\x33\x12\x13\xba\xbb\x9a\xbc";
char ptn2[] = "\x12\x34\x56\x78";

// Now, initialize the structs with their char* pointers pointing at the editable
// strings:
ENTRY Entries[] = {
  {ptn1, 44}
, {ptn2, 555}
};

That should work fine. However, note that the memory for the strings is on the stack, and thus will go away if you leave the current scope. That's not a problem if Entries is on the stack too (as it is in this example), of course, since it will go away at the same time.

Some Q/A on this:

Q: Why can't we initialize the strings in the array-of-structs initialization? A: Because the strings themselves are not in the structs, and initializing the array only allocates the memory for the array itself, not for things it points to.

Q: Can we include the strings in the structs, then? A: No; the structs have to have a constant size, and the strings don't have constant size.

Q: This does save memory over having a string literal and then malloc'ing storage and copying the string literal into it, thus resulting in two copies of the string, right? A: Probably not. When you write

char pattern[] = "\x12\x34\x56\x78";

what happens is that that literal value gets embedded in your compiled code (just like a string literal, basically), and then when that line is executed, the memory is allocated on the stack and the value from the code is copied into that memory. So you end up with two copies regardless -- the non-editable version in the source code (which has to be there because it's where the initial value comes from), and the editable version elsewhere in memory. This is really mostly about what's simple in the source code, and a little bit about helping the compiler optimize the instructions it uses to do the copying.

Brooks Moses
Well I like the idea, but of course it is not ideal since the arrays are just loose data as opposed to part of the structs. That method, while smart, looks more like my original non-extensible, hard-coded version.Plus, I would have to make a whole bunch of individual, named arrays (eg *ptn1a, ptn1b, ptn2a, ptn2b, … ptn7a, ptn7b*). :(
Synetech inc.
Agreed on not looking ideal -- but the character arrays can't be part of the structs, because in order to be part of a larger array the structs have to all be the same size; you can't have an array with varying-sized elements. So you're going to have to allocate the writable memory for those strings somehow, outside of creating your array of structs.
Brooks Moses
I thought about your last comment and it occurred to me that I could store the arrays in a resource. That way, I could store pointers in the arrays and initialize them by loading the (binary) resource. That should both work and be clean, easy to use, and extensible right?
Synetech inc.
That depends on what exactly you mean by "resource" and how you're loading it -- and, in particular, whether your loading process loads it into writable memory or not. If you mean a resource that's linked into the compiled program, then it's almost certainly going to be treated like the string literals and not be editable. If you mean something that your program directly copies into memory, then that might well work.
Brooks Moses
+1  A: 

Other answers have addressed the reason for the error: you're modifying a string literal which is not allowed.

This question is tagged C++ so the easy way to solve your problem is to use std::string.

struct ENTRY {
  std::string pattern;
  int         somenum;
};
Mark B