ansaurus

Question

Unicode Regular Expressions - Fails at 343 characters

Answer 1

+1 A:

If you're "weeding out" non-Latin characters, why not just do this:

preg_replace('/[^\p{Latin}]+/u', '', $s)

EDIT: Okay, so you're trying to validate the input. I was going to say, use this:

preg_match('/^[\p{Latin}]+$/u', $s)

...but it turns out that only matches Latin letters. I was thinking of Java's undocumented shorthand, \p{L1}, which matches everything in the Latin1 (ISO-8859-1) character set, but in PHP you have to spell it out:

preg_match('/^[\x00-\xFF]+$/u', $s)

Alan Moore 2010-07-05 02:55:17

@Alan, thank you. However, I would like to notify the user of the error, and I need the validation to fail in order for an error to occur. Thus the validation rule (the reg expression) needs to look for what 'correct' looks like.

KcYxA 2010-07-05 04:29:26

KcYxA 2010-07-06 03:55:52

Oh yeah, I meant to suggest that. I knew it was gratuitously inefficient to put the `\s` in its own alternative and wrap the whole thing in a capturing group, but I wouldn't have expected it to go pear-shaped on such a small input.

Alan Moore 2010-07-06 09:52:46

ansaurus

tags:

views:

answers:

Unicode Regular Expressions - Fails at 343 characters

related questions