views:

119

answers:

5

Hi all,

I have a file which I use to centralize all strings used in my application. Lets call it Strings.txt;

TITLE="Title"
T_AND_C="Accept my terms and conditions please"
START_BUTTON="Start"
BACK_BUTTON="Back"
...

This helps me with I18n, the issue is that my application is now a lot larger and has evolved. As such a lot of these strings are probably not used anymore. I want to eliminate the ones that have gone and tidy up the file.

I want to write a python script, using regular expressions I can get all of the string aliases but how can I search all files in a Java package hierarchy for an instance of a string? If there is a reason I use use perl or bash then let me know as I can but I'd prefer to stick to one scripting language.

Please ask for clarification if this doesn't make sense, hopefully this is straightforward, I just haven't used python much.

Thanks in advance,

Gav

A: 

You might consider using ack.

% ack --java 'search_string'

This will search under the current directory.

Jason Baker
+4  A: 

Assuming the files are of reasonable size (as source files will be) so you can easily read them in memory, and that you're looking for the parts in quotes right of the = signs:

import collections
files_by_str = collections.defaultdict(list)

thestrings = []
with open('Strings.txt') as f:
  for line in f:
    text = line.split('=', 1)[1]
    text = text.strip().replace('"', '')
    thestrings.append(text)

import os

for root, dirs, files in os.walk('/top/dir/of/interest'):
  for name in files:
    path = os.path.join(root, name)
    with open(path) as f:
      data = f.read()
      for text in thestrings:
        if text in data:
          files_by_str[text].append(path)
          break

This gives you a dict with the texts (those that are present in 1+ files, only), as keys, and lists of the paths to the files containing them as values. If you care only about a yes/no answer to the question "is this text present somewhere", and don't care where, you can save some memory by keeping only a set instead of the defaultdict; but I think that often knowing what files contained each text will be useful, so I suggest this more complete version.

Alex Martelli
Fantastic answer, greatly appreciated.
gav
@gav, you're welcome!
Alex Martelli
A: 

to parse your strings.txt you don't need regular expressions:

all_strings = [i.partition('=')[0] for i in open('strings.txt')]

to parse your source you could use the dumbest regex:

re.search('\bTITLE\b', source)        # for each string in all_strings

to walk the source directory you could use os.walk.

Successful re.search would mean that you need to remove that string from the all_strings: you'll be left with strings that needs to be removed from strings.txt.

SilentGhost
A: 

You should consider using YAML: easy to use, human readable.

jldupont
A: 

You are re-inventing gettext, the standard for translating programs in the Free Software sphere (even outside python).

Gettext works with, in principle, large files with strings like these :-). Helper programs exist to merge in new marked strings from the source into all translated versions, marking unused strings etc etc. Perhaps you should take a look at it.

kaizer.se