views:

721

answers:

4

I have string which is base64 encoded. How can I search this string to check if this string contains specific sub string which is not encoded? I don't want to decode that string and then search it.

Can I just encode that specific sub string, and search the encoded string by using the encoded sub string?

Thanks,

A: 

The Base64 could take on several different forms or meanings with differing algorithms or implementations. Even looking at the examples on Wikipedia, one can see that the encoded values of characters may change depending on position. Short answer: no, you can't encode just the string and search in the larger encoded text.

Kris Kumler
Good xref; not sure about 'different meanings'. The same text can be encoded in 3 different forms, depending on whether it is first, second or third byte in the string. And that most certainly complicates the search.
Jonathan Leffler
Yes, that's what I was trying to get across and simplify.
Kris Kumler
A: 

You can't just search for an encoded substring. Your search string will be encoded differently depending on where in the original string it appears. I think you will need to decode the entire string and then search for your substring.

Ken Paul
+7  A: 

The best way is probably to just to decode the string. However, if really necessary, it is possible to do this on the fly instead of a full decode followed by a search. You'll have to implement your one search and just decode only that part that you are currently inspecting. This is most likely only useful if you have very very big strings that you really do not want to (or cannot) store twice in memory.

If the string you search for is long enough, you can also encode that string three times with with different padding (e.g. '', 'x' and 'xx') and search for those without the first 4 and last 4 characters (you don't want to match the padding). When you find a match, you have to make sure the alignment corresponds with the padding and verify that the parts that you didn't match yet (due to the padding) are also in place. The latter does require some decoding, of course.

mweerden
Nice and thorough. I would definitely go for on-the-fly decoding, wher e you decode the characters but don't store them. Anything else will be nightmarish. But if you must suffer, this answer tells you how :-)
Norman Ramsey
+1  A: 

Assuming you know the exact form of base64 encoding involved, you could encode your string as if it occurred at each of the three offsets (start%3 == 0, start%3 == 1, start%3 == 2). You'd have to be cunning around the start and end of the string, as those characters will be affected by the surrounding data. You could then just use a normal IndexOf or whatever to check the middle part of the string, and then check the start and end more smartly.

Personally I wouldn't go to all of this trouble though - as the other suggestions recommend, just decode and then search. It's going to be much easier to get right.

Jon Skeet