views:

196

answers:

2

We're in the final stages of shipping our console game. On the Wii we're having the most problems with memory of course, so we're busy hunting down sloppy coding, packing bits, and so on.

I've done a dump of memory and used strings.exe (from sysinternals) to analyze it, but it's coming up with a lot of gunk like this:

''''$$$$    %%%%
''''$$$$%%%%####&&&&
''''$$$$((((!!!!$$$$''''((((####%%%%$$$$####((((
''))++.-$$%&''))
'')*>BZf8<S]^kgu[faniwkzgukzkzkz
'',,..EDCCEEONNL

I'm more interested in strings like this:

wood_wide_end.bmp
restroom_stonewall.bmp

...which mean we're still embedding some kinds of strings that need to be converted to ID's.

So my question is: what are some good ways of finding the stuff that's likely our debug data that we can eliminate?

I can do some rx's to hack off symbols or just search for certain kinds of strings. But what I'd really like to do is get a hold of a standard dictionary file and search my strings file against that. Seems slow if I were to build a big rx with aardvaark|alimony|archetype etc. Or will that work well enough if I do a .NET compiled rx assembly for it?

Looking for other ideas about how to find stuff we want to eliminate as well. Quick and dirty solutions, don't need elegant. Thanks!

A: 

This sounds like an ideal task for a quick-and-dirty script in something supporting regex's. I'd probably do something in python real quick if it was me.

Here's how I would proceed: Every time you encounter a string (from the strings.exe output), prompt the user as to whether they'd like to remember it in the dictionary or permanently ignore it. If the user chooses to permanently ignore the string, in the future when its encountered, don't prompt the user about it and throw it away. You can optionally keep an anti-dictionary file around to remember this for future runs of your script. Build up the dictionary file and for each string keep a count or any other info about it you'd like about it. Optionally sort by the number of times the string occurs, so you can focus on the most egregious offenders.

This sounds like an ideal task for learning a scripting language. I wouldn't bother messing with C#/C++ or anything real fancy to implement this.

Doug T.
I should have mentioned that the uniq'd string output is multi-meg. Too much for string-by-string approvals.
Scott Bilas
+1  A: 

First, I'd get a good word list. This NPL page has a good list of word lists of varying sizes and sources. What I would do is build a hash table of all the words in the word list, and then test each word that is output by strings against the word list. This is pretty easy to do in Python:

import sys

dictfile = open('your-word-list')
wordlist = frozenset(word.strip() for word in dictfile)
dictfile.close()

for line in sys.stdin:
    # if any word in the line is in our list, print out the whole line
    for word in line.split():
        if word in wordlist:
            print line
            break

Then use it like this:

strings myexecutable.elf | python myscript.py

However, I think you're focusing your attention in the wrong place. Eliminating debug strings has very diminishing returns. Although eliminating debugging data is a Technical Certification Requirement that Nintendo requires you to do, I don't think they'll bounce you for having a couple of extra strings in your ELF.

Use a profiler and try to identify where you're using the most memory. Chances are, there will be a way to save huge amounts of memory with little effort if you focus your energy in the right place.

Adam Rosenfield
I think this is what I'm looking for, thanks. Note that we're looking in many areas for memory optimizations. This is just one route I'm exploring because it's easy. We eliminated game object names last week and it saved 100's of K's of mem. Looking for more easy wins but all the noise in the file is making it difficult.
Scott Bilas
Without knowing your system, the expression 'because it's easy' reminds me of an old joke where a drunkard is searching for his lost keys under a street light. When someone offers to help and asks where the keys fell, the drunkard points to the other side of the street, but he is looking under the light because it is much easier. Just a joke, don't take it hard. Profiling will point to where the keys fell.
David Rodríguez - dribeas
Good joke, but not applicable here. For the purposes of this question, let's just assume that I know what I'm doing and have done this kind of thing on many games. I seriously am really just interested in doing a scan of the output of strings, as a small part of a large set of potential optimizations that we're looking into.
Scott Bilas