views:

82

answers:

3

I need to extract a description from a file, which looks like this: "TES4!\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00HEDR\x0c\x00\xd7\xa3p?h\x03\x00\x00\x00\x08\x00\xffCNAM\t\x00Martigen\x00SNAM\xaf\x00Mart's Mutant Mod - RC4\n\nDiverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.\n\n\x00MAST\r\x00Fallout3.esm\x00DATA\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00MAST\x16\x00Mart's Mutant Mod.esm\x00DATA\x08"

I've laready figured out how to get the part I need, but there's still some unwanted data in there that I don't know how to get rid of: \xaf\x00Mart's Mutant Mod - RC4\n\nDiverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.\n\n\x00

should become: Mart's Mutant Mod - RC4\n\nDiverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.\n\n\

Basically, I need a way to get rid of the \x## stuff (which if left in there will end up as weird characters when displayed in the GUI), but I haven't managed to get to successfully remove them.

[In case you were wondering, it's .esp files for FO3 I'm messing around with.]

+4  A: 

you could try:

import string

cleaneddata = ''.join(c for c in data if c in string.printable)

This assumes that you already have data in a string.

Here's how it works for me:

>>> s = """TES4!\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x00HEDR\x0c\x00\xd7\xa3p?h\x03\x00\x00\x00\x08\x00\xffCNAM\t\x00Martigen\x00SNAM\xaf\x00Mart's Mutant Mod - RC4\n\nDiverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.\n\n\x00MAST\r\x00Fallout3.esm\x00DATA\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00MAST\x16\x00Mart's Mutant Mod.esm\x00DATA\x08"""
>>> print ''.join(c for c in s if c in string.printable)TES4!HEDR
         p?hCNAM    MartigenSNAMMart's Mutant Mod - RC4

Diverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.

Fallout3.esmDATAMASTMart's Mutant Mod.esmDATA
>>> 

Not ideal as you can see but that might at least be a good first step.

aaronasterling
+1 so your 6666 rating isn't as satanic :-)
Gary
+4  A: 

First thing we do is pull up some docs. If we take a look at the bottom it shows how the SNAM subrecord should be handled. So we use struct to read the length, then we grab that many bytes (I'm guessing that you forgot to open the file in binary mode, since the count is off in your example) from the string, null-terminated. And then there's nothing left to do, since we have what we came for.

Ignacio Vazquez-Abrams
aka RTFM, LOL ;-) Of course that assumes you know where it is and what kind of data you're looking at...
martineau
+1. Way better than my answer.
aaronasterling
A: 

If you are up to the point of

\xaf\x00Mart's Mutant Mod - RC4\n\nDiverse creatures & NPCs, new creatures & NPCs, dynamic size and stat scaling, increased spawns, improved AI, improved factions, and much more.\n\n\x00

you can do the following to get rid of the last unwanted \x## by doing:

exp = re.compile(r"\\x[\w]")
newStr = [s for s in str.split("\\x00") if not re.search(exp, s)]
newStr = "".join(newStr)
Rod