views:

355

answers:

7

I need to slightly change a string in an exe which I dont have source code for anymore. It was writtin in C.

I notice that C string literals do not seem to appear in the machine code listing at all - not in raw ASCII anyway, not in utf8/16/32 or anything like that. They seem to be encoded, I am guessing as part of 32bit op-codes. For example I know the code had c line: print("My string"); by the bytes 'm' 'y' ' ' 's' etc. but the string doesnt seem to appear at all, not even in any utf8/16/32 coding, and not even with 1/2/3/4/5/6/7/8 bytes between each character (I have checked for all those combinations).

From what I understand the literals could be in immediate op-codes, and these could have the 8 bit for the ASCII value at non-byte alligned position. Anyone know what opcodes I should be looking for? At the moment I still can't find anything that looks like the strings even if I shift the whole file bit by bit.

+3  A: 

Doubtful that a simple print statement would get encoded like that by the compiler.

My guess is any of the following:

  1. The EXE is getting the string from elsewhere. (Another file, another dll, etc...)

  2. You aren't looking hard enough to find the string. I'm assuming you used a hex editor that shows ascii as well as octal?

  3. The author of the code went out of his way to prevent you from doing what you want.

What are you really trying to do anyway?

selbie
Octal? Oh, yeah like he works in a museum and he's got a CDC 6xxx box or an ICL 1900 series where the word-length is^H^Hwas a multiple of 6 bits and it used a 6-bit character set :-)
John Machin
hey... it could happen.
Carson Myers
I meant to say "hex editor that shows ascii in addition to hex".
selbie
+1  A: 

Use a tool such as Dumpbin (provided with Visual Studio) or objdump (a GNU tool, available on any platform).

Dump the content of the sections called .rodata and .text; the string is probably there. If you can't find it, search in the other data and code sections.

If you really can't find it, then the executable may be encrypted. But since you wrote it, it's not very likely. =)

[Edit]

In my opinion the most probable possibility is that the string was put in the code section (probably called .text). You should dump it as data, and use a tool such as grep of a hexadecimal editor to search the string.

Bastien Léonard
Its not encrypted. I know that some strings (the ones used more than once) are easy to find. The ones that are used once I can't find at all. I guessed they are in load immidate op-codes with the operand a at a non-byte offset. I will try objdump. I think when I was checking with bit shifts to the whole file I was shifting wrong, 16/32 bit the endian is the different way to what I was doing, so I might fix that and see if I can find where these pesky strings are.
myforwik
+2  A: 

instead of shifting the whole file bit by bit or looking at a bunch of different encodings, why don't you just disassemble the executable? The program can't just do mysterious things without the code, and you can read the code by disassembling it. If the data is stored in opcodes it will be hard to change, but I can't imagine why the compiler would store a string that way.

Carson Myers
+1  A: 

The probable answer to the used once versus used more often question is that used more often ones are stored in a separate section, but the used once strings are stored interspersed with the code (e.g. after an unconditional jump/branch instruction). Why you can't see the strings with a hex editor is a mystery; a "load immediate string" opcode would be rather unusual (it's the ADDRESS of the start of the string that's required to pass as a function argument) and in any case the string should be visible. A string not being stored on a byte boundary would be extremely unusual.

Suggestion: Create a small test program with a few strings used once and a few strings used more than once and look at it with (a) objdump (b) a hex editor. If your compiler has an option of displaying the assembly code generated for each source line, turn it on. Repeat all the above for each optimisation level the compiler offers. Then use the knowledge gained on the real file.

Please consider that divulging what machine architecture is involved and what compiler (it's not a state secret, is it?) could give you a better solution sooner and avoid possible downvoting of your question ;-)

John Machin
A: 

I just compiled hello world in C on gcc, and then read the exe in SciTE and I can see the string in the gibberish. Try looking at the exe in something other than a hex editor.

EDIT: I just tried changing the string I found (adding letters to the string in the middle of a word) but it broke the exe. So, I don't know how you're going to change the string.

akway
SciTE was probably insufficiently careful about CRLF vs newline for this edit to be correct. I've patched text strings in EXE files successfully in the past.
RBerteig
You should make sure that the string has the same length as before (pad it with \0), so that references to other data are valid.
Bastien Léonard
+2  A: 

Though I am not sure why you are not able to find the string,

I am sure that it will be dangerous and very difficult job to just update the string

Alphaneo
+1 for pointing out the horror of it
Carson Myers
A: 

I traced the program and found that it stores the strings in a section that it uses DEFLATE on when initialising, nothings ever easy :-)

I don't know what compiler I used, I think it was a watcom compiler. The code is over 10 years old.

myforwik
Now there's a crazy idea. This was probably an early EXE packer, which made sense when disk space (and bandwidth on and off the platter) was more valuable than CPU the cycles needed to uncompress the data. Patching it won't be trivial.
RBerteig
Or maybe someone used UPX on it.
Christopher