views:

236

answers:

2

I get an input string with some data that's base64 encoded. Unfortunately, it gets random hexadecimal data (all lowercase) mixed it. It's fairly straightforward to sort out by hand because the hexadecimal data all seems to be in segments of 32 bytes. For example, I can format an example string like this:

    6dd11d15c419ac219901f14bdd999f38
    0ad94e978ad624d15189f5230e5435a9
    2dc19fe95e583e7d593dd52ae7e68a6e
    465ffa6074a371a8958dad3ad271181a
    23310939b981b4e56f2ecee26f82ec60
    fe04bef49be47603d1278cc80673b226

    VGhpcyBpcyBzb

    6dd11d15c419ac219901f14bdd999f38
    0ad94e978ad624d15189f5230e5435a9
    2dc19fe95e583e7d593dd52ae7e68a6e
    465ffa6074a371a8958dad3ad271181a
    23310939b981b4e56f2ecee26f82ec60
    fe04bef49be47603d1278cc80673b226
    6dd11d15c419ac219901f14bdd999f38
    0ad94e978ad624d15189f5230e5435a9
    2dc19fe95e583e7d593dd52ae7e68a6e
    465ffa6074a371a8958dad3ad271181a
    23310939b981b4e56f2ecee26f82ec60
    fe04bef49be47603d1278cc80673b226

    21lIGJhc2UtNjQ

    bb4af7e61760735ba17c29e8f542a668
    75da91e90863f1ddb7e149297fc59afc
    f5de951fb65d06d2927aab7b9b54830e
    2d935616a54c381c2f38db3731d5a378

    gZW5jb2RlZCB

    6dd11d15c419ac219901f14bdd999f38
    0ad94e978ad624d15189f5230e5435a9
    2dc19fe95e583e7d593dd52ae7e68a6e
    465ffa6074a371a8958dad3ad271181a
    23310939b981b4e56f2ecee26f82ec60
    fe04bef49be47603d1278cc80673b226

    kYXRhIGhvb3JheSE=

Basically, I need to get the base64 stuff out and decode it (in PHP). The catch is that I get it all as one long string and it's not always immediately obvious where to put the linebreaks. For example, the first bit of base64 stuff ends in 'b', easily mistaken for some of the hex data. I'm at something of a loss for how to do this... Any ideas?

Thanks!
-mala

+5  A: 

I think this is an unanswerable problem -- it is entirely possible to have 32 bytes worth of base64-encoded data that cannot be differentiated from 32 bytes of random hex. Without more information about the stream it would be impossible to make a decision as to which bucket such data might go.

fbrereto
A: 

There is the possibility that base64 decoding up to each decision point (next 32 bytes base64 or hex) might carry the clue.

There's also the most minute chance that interpreting one of those hex strings as base64 always yields easily detected garbage for whatever is being decoded.

Otherwise you're out of luck.

Joshua