I have a few-million-line PHP code base without true separation of display and logic, and I am trying to extract all of the strings that are represented in the code for the purposes of localization. Separation of display and logic is a long term goal, but for now I just want to be able to localize.
In the code, strings are represented in every possible format for PHP, so I need a theoretical (or practical) way to parse our entire source and at the very least LOCATE where each string lives. Ideally, of course, I'd replace every string with a function call, for example
"this is a string"
would be replaced with
_("this is a string")
Of course I'd need to support both single and double quote format. The others I'm not too concerned about, they appear so infrequently that I can manually change them.
Also, I wouldn't want to localize array indexes of course. So strings like
$arr["value"]
should not become
$arr[_("value")]
Can anyone help me get started in this?