ansaurus

Question

Regex - If contains '%', can only contain '%20'

Answer 1

+5 A:

Doesn't require a %:

/^[^%]*(%20[^%]*)*$/

Mark Byers 2009-12-02 09:39:43

Thanks Mark, this suits my needs.

Kyle Rozendo 2009-12-04 05:39:38

Answer 2

+1 A:

I think that would find what you need

/^([^%]|%%|%20)+$/

Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:

In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:

/^\([^%]\|%%\|%20\)\+$/

MBO 2009-12-02 09:43:52

Joey 2009-12-02 10:49:27

You're right, it stopped before invalid sequence and matched empty strings in loop. Fixed now

MBO 2009-12-02 10:54:46

Seems to work now. Nice.

Joey 2009-12-02 11:01:01

I don't seem to be getting any matches with this.

Kyle Rozendo 2009-12-02 11:01:55

@Kyle I seem to get matches to provided strings. You didn't mention which language you use, and if you only want to test for match, or extract something by rx. I tested mine in Ruby and with <http://www.rubular.com/regexes/12107>

MBO 2009-12-02 11:07:29

Answer 3

+1 A:

Another solution if look-arounds are not available:

^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$

Gumbo 2009-12-02 09:47:32

Answer 4

A:

Maybe a better approach is to deal with that validation after you decode that string:

string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);

Rubens Farias 2009-12-02 09:48:16

Yep, tried this approach before and it unfortunately did not suite the scenario, thanks for the answer though!

Kyle Rozendo 2009-12-02 09:59:22

Answer 5

+2 A:

Which language are you using?

Most languages have a Uri Encoder / Decoder function or class. I would suggest you decode the string first and than check for valid (or invalid) characters.

i.e. something like /[\w ]/ (empty is a space)

With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".

SchlaWiener 2009-12-02 09:49:54

Eep, very valid point. Thanks.

Joey 2009-12-02 09:54:47

Ok, couldn't get the regex to work with that constraint. Another test string to consider (which should fail again): "example.com/?pass=%%%25"

Joey 2009-12-02 10:02:16

Bugger you're right. That I need to update the question with. Unfortunately I don't have the option of URL Decoding here, so the regex is my lost hope ;)

Kyle Rozendo 2009-12-02 10:06:05

Answer 6

A:

I agree with dominic's comment on the question. Don't use Regex.

If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)

// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
     if mystring[pos+1] = "%" then
         pos = pos + 2 // ok, this is a literal, skip ahead
     else if mystring.substring(pos,2) != "20" 
          return false; // string is invalid
     end if
}
return true;

Isak Savo 2009-12-02 09:50:40

Why should `%200` be disallowed? If I want to type a `0` after a space this should be totally possible, actually.

Joey 2009-12-02 09:51:55

As I said, it is not possible to do this in the current scenario, however I appreciate the answer, thanks.

Kyle Rozendo 2009-12-02 09:58:42

Your approach suffers from the same problem as the regex one, though. See SchlaWiener's answer.

Joey 2009-12-02 10:02:51

Johannes: good point

Isak Savo 2009-12-02 11:18:47

About %200: I was under the impression that multi-octed characters (e.g. UTF-8 encoded characters) would be URL-encoded with a single '%' sign, but I may be wrong here. If so, then no need to check for subsequent digits

Isak Savo 2009-12-02 11:36:25

Answer 7

A:

/^([^%]|%20)*$/

Dave Hinton 2009-12-02 10:25:47

Would someone care to explain the downvote? If my answer is wrong I would like to know why.

Dave Hinton 2009-12-02 13:02:40

This means "Either starts with something that is "%20" or not "%" ...

Mez 2009-12-02 13:45:23

...and continues with something that is either `%20` or not `%`, until we hit end of string. Which is what the OP asked for.

Dave Hinton 2009-12-03 11:45:06

Answer 8

+1 A:

Reject the string if it matches %[^2][^0]

Amarghosh 2009-12-02 10:36:38

-1 This wouldn’t allow *any* string that contains `%2x` or `%x0` where `x` can be any arbitrary character.

Gumbo 2009-12-02 14:41:24

@Gumbo **And that's exactly what OP wants**. Quoting from the question "If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'"

Amarghosh 2009-12-02 15:32:22

Answer 9

A:

This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.

As others have said before, %% is valid too... and %%25would be %25

The below regex matches anything that doesn't fit into the above rules

/(?<![^%]%)%(?!(20|%))/

The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20

This means that if anything is identified by the regex, then you should probably reject it.

Mez 2009-12-02 10:48:34

It doesn't work correctly for `%%25` here.

Joey 2009-12-02 10:52:55

Apologies... fixed.

Mez 2009-12-02 13:47:50

ansaurus

tags:

views:

answers:

Regex - If contains '%', can only contain '%20'

related questions