views:

160

answers:

1

Hi,

In my program, I'm grep-ing via NSTask. For some reason, sometimes I would get no results (even though the code was apparently the same as the command run from the CLI which worked just fine), so I checked through my code and found, in Apple's documentation, that when adding arguments to an NSTask object, "the NSTask object converts both path and the strings in arguments to appropriate C-style strings (using fileSystemRepresentation) before passing them to the task via argv[]" (snip).

The problem is that I might grep terms like "Río Gallegos". Sadly (as I checked with fileSystemRepresentation), that undergoes the conversion and turns out to be "RiÃÅo Gallegos".

How can I solve this?

-- Ry

+1  A: 

The problem is that I might grep terms like "Río Gallegos". Sadly (as I checked with fileSystemRepresentation), that undergoes the conversion and turns out to be "RiÃÅo Gallegos".

That's one possible interpretation. What you mean is that “Río Gallegos” gets converted to “Ri\xcc\x81o Gallegos”—the UTF-8 bytes to represent the decomposed i + combining acute accent.

Your problem is that grep is not interpreting these bytes as UTF-8. grep is using some other encoding—apparently, MacRoman.

The solution is to tell grep to use UTF-8. That requires setting the LC_ALL variable in your grep task's environment.

The quick and dirty value to use would be “en_US.UTF-8”; a more proper way would be to get the language code for the user's primary preferred language, replace the hyphen, if any, with an underscore, and stick “.UTF-8” on the end of that.

Peter Hosey
Thanks for the answer, but it doesn't work...I also tried setting the LC_CTYPE and LANG variable in the grep task's environment, but still no luck.
How did you determine that grep is interpreting the bytes the way you showed in your question?
Peter Hosey
Via NSString's fileSystemRepresentation method and NSLog() statements. Experimenting showed that only strings without "non-standard" characters such as 'í' work.I see that this is no proof, but it's strong evidence.
And how are you viewing the NSLog output?
Peter Hosey
With the debugger console in XCode.
OK, then. Try this: `NSLog(@"%@ = %lu bytes", myString, (unsigned long)strlen([myString fileSystemRepresentation]));` What does that log?
Peter Hosey
If myString is "Río Gallegos" (without quotes), the output is:Río Gallegos = 14 bytes
By the way, if I try:NSLog(@"%s = %lu bytes", "Río Gallegos", (unsigned long)strlen("Río Gallegos"));It logs:R√≠o Gallegos = 13 bytes
ryyst: Interesting. That output seems right, so grep is indeed misinterpreting it. Either that, or the target text really doesn't contain the pattern. (Perhaps the target text is not UTF-8, or uses a different normalization form? I don't think grep really understands encodings or Unicode.)
Peter Hosey
Well, running "grep "Río Gallegos" <filename.txt>" does show results, so I guess it really is an encoding problem.This little snippet (http://pastebin.org/128441) shows that strings encoded with fileSystemRepresentation are actually extremely limited.I'm thinking about just using system() calls, NSTask is really annoying me.
ryyst: Um. Well, that code explains it. First, neither NSTask nor grep is what's interpreting the bytes as MacRoman; NSString is. And it's doing that because *you told it to*. So, don't do that. The bytes are UTF-8, so interpret them as such. (Also, how are you getting “Río Gallegos” from a pointer to a `char` variable into which you've assigned an `int`?)
Peter Hosey
Okay, but even if I messed up with encodings in the snippet, that doesn't really explain the problem I'm experiencing with NSTask, does it? This (http://lists.apple.com/archives/Cocoa-dev/2007/Apr/msg01324.html) might also be an interesting read, as it covers exactly my problem. However, they all just say that it should just work OOB, which it obviously doesn't in my case...By the way, I can post all the code related to NSTask, if that simplifies matters for you. And thanks for all your efforts!
I think we need to see where this mystical “s” is really coming from. Assigning an `int` to a `char` variable does not make a valid string in any encoding.
Peter Hosey
Hm, I always thought that was the C way of listing characters - apparently I'm wrong? Anyway, how are these two problems related to each other?
Well, the code you're showing is fantastically unlikely to produce the result you've claimed. You've declared a variable holding *a* `char`—just one, not an array of them. Then, in a loop, you assigned an `int` into this variable; the first time through the loop, it is zero. Then you take the address of this `char`, and treat it as a C string; the character at the pointer being zero, this is an empty C string. The second time through the loop, the character at the pointer is 1 and any characters thereafter could be anything—you'll get random garbage, almost certainly not a person's name.
Peter Hosey
Wild theory: Did you mean to represent the integer as decimal digits? Assigning to a `char` variable (or an array of `char`) won't do that; the only “conversion” there is lopping off the bits that won't fit. Assigning 1, say, to a `char` variable will put a 1 (as in, 0x01, not `'1'`, which is 0x31) byte in it. If converting the number to a decimal representation in a string is what you meant to do, then use NSString's `stringWithFormat:`.
Peter Hosey
No, I didn't try to create useful strings or anything, I just wanted to list some characters and see what fileSystemRepresentation would do with them.What I now tried is putting a line with "RiÃÅo Gallegos" inside my text file. When passing "Río Gallegos" as argument to my NSTask object, it now indeed finds the line I added – grep is apparently really misinterpreting the argument.I still don't know how to keep either NSTask or grep from doing what they do now ...
What do you mean “grep is… misinterpreting the argument”? If it finds the string, doesn't that mean it interpreted it correctly? And why would you put “RiÃÅo Gallegos” into the text file? You should put in “Río Gallegos” as UTF-8, then load the data from the file, decode the data as UTF-8 to get a string ( http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/initWithData:encoding: ), and pass that string to the task.
Peter Hosey
Oh, and don't use TextEdit to edit the file; it's dumb about UTF-8. Use TextWrangler instead: barebones.com/products/textwrangler Another way would be to put the string in your Info.plist. However, assuming that you're not going to get the real string that you'll use in your shipping app from either of these sources, you'll have to fix wherever you're really creating the string.
Peter Hosey
It's not a problem with the text file, nor myself messing up with NSStrings. It's NSTask doing a conversion it shouldn't and grep not understanding the arguments right anymore.I'm not sure if the problem can be solved at all.