views:

223

answers:

2

Hi

I am wondering what other approaches you would take to do some simple string splitting in PHP. I am receiving a response from a SMS gateway where two of the interesting values are the code used and the users text message.

The code could be something like: Freetrip (lowercase, uppercase, mixed case)

The user message should in the best case scenario be e.g. like: Freetrip 12345 ($code "space" XXXXX).

Each X should be a digit between 1 and 5. Any other value/character should return an error. So the regex would be simplified as: chars=5 where each digit >=1 and <=5.

What I need to store at the end would be each of the 5 digits values.

My simplest approach would be to lowercase the entire message string and subtract the also lowercased code (plus the space) from the message string. That would leave me with the 5 digits which I would then split into 5 unique variables to store in the DB.

Now the tricky part is that the best case scenario described above may be hard to achieve. Typing a SMS is fiddly and typing errors occur easily. Some errors that may occur are the following:

  • Too few or many digits.
  • Non-digits characters.
  • More characters after the XXXXX combination.
  • Probably some other cases.

Any of those should return an individual error message which I can return to the sender.

+1  A: 
if (!preg_match('/^freetrip\s+([1-5]{5})$/i', $sms, $matches)) exit("error");
print_r($matches);

I had some experience with SMS-platforms and AFAIK one error is enough. We tried to detect similar characters like small L and big I etc, or zero and O-letter. For example in your case you could write something like this:

preg_match('/^freetr[il1|]p\s+([1-5]{5})$/i', $sms, $matches);

the same you can do in any place of message pattern (if you want).

I did something like this (not sure - it was 5 years ago):

if (!preg_match('/^(\w+)\s+(.*)/i', $sms, $matches)) exit('bad message format');
$value = $matches[2];

// some letters look like digits
$value = str_replace(array('o', 'O'), 0);
$value = str_replace(array('i', 'I', 'l'), 1);
if (!preg_match('/^[12345]{5}/')) exit("invalid code");
// do something here... message is OK.

Sure in this case you can check "freetrip" or not, value is [1-5]{5} or not etc, and response your error as much as allows your imagination :). Good luck.

EDIT: The last one is updated and should fit your case. It's better, because it will be very simple to create another service on it's example if you'll need it.

Jet
Thanks Evgeniy, I am not the strongest in regex so I do not fully understand each of the details in the expressions, although the last one would seem to care for all short codes. The second one is not needed as I would not received the message from the SMS gateway unless the code matches exactly.Would the first expression be valid if any characters were (wrongly) written by the sender after the XXXXX?
mr-euro
No. In this case you can use this regex: '/^freetrip\s+([1-5]{5}).*/i'. I means "case insensitive word freetrip at the beginning (^ anchors to string start), 1 or more space characters (\s+), exactly 5 characters from 1 to 5 ([1-5{5}]) and then zero ro more any characters (.*)".
Jet
Had a look again. This solution still doesn't cover non-digit characters in XXXXX. Why not leting end-user use them? It's so easy to replace fake digits before. I.e. $sms = str_replace('O', '0', $sms); // O-letter to zero. Just find all cases similar in writing to digits and replace them. IMHO it's much better than boreful telling to user - "Write it again and be careful".
Jet
Well in the case of non-digits in XXXXX then your expression would correctly catch it as an error. I am not that concerned about the erroneous cases as I am about getting the single correct case right. Although it is definitely nicer if I can feedback to the sender what he is doing wrong more precisely. Regarding the substitution of characters do you think the users would send O instead of 0 when on the mobile phone keypad they are completely apart?I am still wondering if this can be done without using regex but only php core functions.
mr-euro
It can be done even in assembler, but why reinventing the wheel?And about 0-O I don't think - I'm sure. My gateway passed through about 100K messages per day and about 10-20 were with such kind of errors. It's not typo, it's wrong reading the code.
Jet
True, the correct interpretation of the character by the sender I did not think of. Thanks.
mr-euro
A: 

You could do something like that:

$code = 'Freetrip';
if (strlen($input) <= strlen($code)) {
    // too short
} elseif (!preg_match('/^'.preg_quote($code, '/').'(.*)/i', $input, $match)) {
    // wrong code
} else {
    $x = (int)trim($match[0]);
    if ($x < 11111) {
        // too small
    } elseif ($x > 55555) {
        // too large
    } else {
        // valid
    }
}
Gumbo
Nice way for a single case. If you design a platform which provides different SMS-services - you'll get "fubar" once. This solution is terrible in support and moderation. Just believe me, if you say "service" as programmer - you have to say "admin" and "settings", or you'll support your solution forever.
Jet
Thanks for the suggestion Gumbo. I think the too small/too large statement would break break when a single digit is higher than 5 e.g. in this case 12349.
mr-euro