ansaurus

Question

Answer 1

+10 A:

You could use token_get_all() to get all the tokens from a PHP file e.g.

<?php

$fileStr = file_get_contents('file.php');

foreach (token_get_all($fileStr) as $token) {
    if ($token[0] == T_CONSTANT_ENCAPSED_STRING) {
        echo "found string {$token[1]}\r\n";
        //$token[2] is line number of the string
    }
}

You could do a really dirty check that it isn't being used as an array index by something like:

$fileLines = file('file.php');

//inside the loop and if
$line = $fileLines[$token[2] - 1];
if (false === strpos($line, "[{$token[1]}]")) {
    //not an array index
}

but you will really struggle to do this properly because someone might have written something you might not be expecting e.g.:

$str = 'string that is not immediately an array index';
doSomething($array[$str]);

Edit As Ant P says, you would probably be better off looking for [ and ] in the surrounding tokens for the second part of this answer rather than my strpos hack, something like this:

$i = 0;
$tokens = token_get_all(file_get_contents('file.php'));
$num = count($tokens);
for ($i = 0; $i < $num; $i++) {
    $token = $tokens[$i];

    if ($token[0] != T_CONSTANT_ENCAPSED_STRING) {
        //not a string, ignore
        continue;
    }

    if ($tokens[$i - 1] == '[' && $tokens[$i + 1] == ']') {
        //immediately used as an array index, ignore
        continue; 
    }

    echo "found string {$token[1]}\r\n";
    //$token[2] is line number of the string
}

Tom Haigh 2009-02-21 00:23:03

+1 never knew about this function. Thats awesome.

cletus 2009-02-21 00:26:12

Only thing is that for$_SESSION['logsession']it actually gives mefound string 'logsession'which is of course not what I want for localization.

Ray 2009-02-21 00:33:14

Ah you have since edited.

Ray 2009-02-21 00:36:22

@tomhaigh: I would do a second up-vote, if I could. Hats off.

Tomalak 2009-02-21 00:40:11

@ray: You can probably figure out whether a string's being used as a string or an array ID by looking at it in context of surrounding tokens. I haven't tried it myself though. YMMV.

Ant P. 2009-02-21 00:56:32

Answer 2

A:

Instead of trying to solve this with an overly-clever command line hack using perl or grep, you should write a program to do this :)

Write a perl/python/ruby/whatever script to search through each file for a pair of single or double quotes. Each time it finds a match, it should prompt you to replace it with your underscore function, and you can either tell it to do it or to skip to the next one.

In a perfect world, you'd write something that would do it all for you, but this would probably take less time in the end, and you'd be faced with fewer errors.

Pseudo:

for fname in yourBigFileList:
    create file handle for actual source file
    create temp file handle (like fname +".tmp" or something)
    for fline in fname:
        get quoted strings
        for qstring in quoted_strings:
            show it in context, i.e. the entire line of code.
            replace with _()?
                if Y, replace and write line to tmp file
                if N, just write that line to the tmp file
    close file handles
    rename it to current name + ".old"
    rename ".tmp" file to name of orignal file

I'm sure there's a more *nix-fu way of doing this, but this method would let you look at each instance yourself and decide. if it's a million lines and each one contains a string and each one takes you 1 second to evaluate, then it'll take you about 270-ish hours to do the whole thing... Perhaps you should ignore this post :)

inkedmn 2009-02-21 00:26:32

Sorry but the only relevant part of this answer is the "get quoted strongs" in your pseudocode that you don't address so I'm not sure why you've given this answer.

cletus 2009-02-21 01:08:12

Answer 3

+4 A:

There are some other situations that are likely to exist in the code base that you will utterly break by doing an automatic search and replace in addition to associative arrays.

SQL queries:

$myname = "steve";
$sql = "SELECT foo FROM bar WHERE name = " . $myname;

Indirect variable reference.

$bar = "Hello, World"; // a string that needs localization
$foo = "bar"; // a string that should not be localized
echo($$foo);

SQL string manipulation.

$sql = "SELECT CONCAT('Greetings, ', firstname) as greeting from users where id = ?";

There is no automatic way to filter for all possibilities. Perhaps the solution would be to write an application that creates a "moderation" queue of possible strings and displays each one highlighted and in context of several lines of code. You could then glance at the code to determine if it is a string that needs localization or not and hit a single key to localize or ignore the string.

postfuturist 2009-02-21 00:53:34

ansaurus

tags:

views:

answers:

Finding all string in a PHP code base

related questions