views:

185

answers:

4

I am pulling information from a binary file in C and one of my strings is coming out as \\b\\3777\\375\\v\\177 in GDB. I want to be able to parse this sort of useless data out of my output in a non-specific way - I.e anything that doesn't start with a number/character should be kicked out. How can this be achieved?

The data is being buffered into a struct n bytes at a time, and I am sure that this information is correct based on how data later in the file is being read correctly.

+3  A: 
if( isalnum( buf[ 0 ]) {
    printf( "%s", buf );
}
William Pursell
You don't believe in negative numbers then?
anon
isalnum() answers the question as stated. It's trivial to add checks for '.' and '-'.
William Pursell
A: 

Iterate over your bytes, and check the value of each one to see if it one of the characters that you consider to be valid. I don't know what you consider to be "a interger or char" (i.e. valid values) but you can try comparing the characters to (for example) ensure that:

(c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')

The above condition will ensure that the character's ASCII value is either a number (0 through 9) or a capital or lowercase English letter. Then you have to decide what to do when you encounter a character that you don't want. You can either replace the "bad" character with something "safe" (like a space) or you can build up a new string in a separate buffer, containing only the "good" characters.

Note that the above condition will only work for English, doesn't work for accented characters, and all punctuation and whitespace is also excluded. Another possible test would be to see if the character is a printable ASCII character ((c >= 0x20 && c <= 0x7e) || c == 0xa || c == 0xd which also includes punctuation, space and CR/LF). And this doesn't even get started trying to deal with encodings that aren't ASCII-compatible.

Adam Batkin
Not a good way to do stuff - that's what `<ctype.h>` and its contents are for - and they deal with the issues you mention (especially if you call `setlocale()` from `<locale.h>`.
Jonathan Leffler
A: 

It sounds a bit like you're reimplementing the linux utility strings.

For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character. By default, it only prints the strings from the initialized and loaded sections of object files; for other types of files, it prints the strings from the whole file.

therefromhere
A: 

As the vast majority of the ASCII printable characteres are in the range of 0x20 (' ', space) to 0x7E('~', tilde), you can use this test:

if( (buf[0] >= 0x20) && ( buf[0] <= 0x7E ) )
{
    printf( "%s", buf );
}

this will validate any string starting with any ASCII character.

Vargas