regex

Why does this regular expression not match adjacent occurences of newline?

I was trying to write a regexp to replace all occurrences of \n with \r\n unless the \n is already proceeded imediately by a \r. I'm doing this in Ruby 1.8.6 which doesn't support look behind in regexps so I tried: # try to replace \n proceeded by anything other than \r with \r\n str.gsub(/([^\r])\n/, "\\1\r\n") # \\1 is the captured ch...

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... /> ...

Uncommon regular expressions

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions. ...

Regular expression HTML tags in an array

I've split a large body of XHTML into individual array elements, and I now need to iterate through them and split it at regular intervals. That's not a problem, but I want to ensure I don't split it in the middle of an XHTML tag. So the array looks like: [41] => <p> [42] => materials [43] => and [44] => dosage [45] => forms:</p> [46] =>...

Replacing hyphens in querystring via Regular Expression

Hi all, Due to a restriction with a URLRewrite module, I am replacing all whitespace in a querystring value with hyphens. Server side I want to replace the hyphens back to whitespace, which is fine. However, if there is a hyphen in the querystring (before I encode the value), when I decode the querystring, it removes ALL hyphens, inclu...

Regular Expression with Groups and Values in C#

I am trying to write a simple regex to convert some two digit years to four digit years in a pipe delimited file. I am using: Regex dateFormat = new Regex(@"\|(\d\d)/(\d\d)/([\d\d)\|"); string convertedString = dateFormat.Replace(contents, @"|$1$220$3|'"); What I want is |10/31/09| to be replaced with |10312009|. What I am getting i...

httplib in Python to get the status code...but it is too tricky?

>>> import httplib >>> conn = httplib.HTTPConnection("www.google.com") >>> conn.request("HEAD", "/index.html") >>> res = conn.getresponse() >>> print res.status, res.reason 200 OK This code will get the HTTP status code. However, notice that I split up "google.com" and "/index.html" on 2 lines. And it's confusing. What if I want to ...

Regular expression with pipe

It seems to me that | has a special meaning in regular expression world. I am using ruby and could not find much documentation on same. http://rubular.com/regexes/11724 works. http://rubular.com/regexes/11725 does not work. Why and what is the correct regex. ...

php regex: lookbehind and lookahead and greediness problem

This should be simple but I'm a noob and I can't for the life of me figure it out. I'm trying to use regex to match text inside of special open/close tags: [p2][/p2] So in this text: apple [p2]banana[/p2] grape [p2]lemon[/p2] it should match "banana" and "lemon". The regex I've worked up so far is: (?<=\[p2\]).+(?=\[\/p2\]) But ...

Regexp for chroot-like path building in a Linux environment

Consider the following security problem: I have a static base path (/home/username/) to which I append a user-controlled sub-path (say foo/bar.txt). The content of this file is then read and presented to the user. In the case described the full path would be: /home/username/foo/bar.txt Now to the problem. I want to control so that the...

RegEx match open tags except XHTML self-contained tags

I need to match all of these opening tags: <p> <a href="foo"> But not these: <br /> <hr class="foo" /> I came up with this and wanted to make sure I've got it right. I am only capturing the a-z. <([a-z]+) *[^/]*?> I believe it says: Find a less-than, then Find (and capture) a-z one or more times, then Find zero or more spaces,...

regex to not match if a specified string exists

i'm trying to do an apache rewrite where if the term "admin" is contained in the request_uri mydomain.com/admin/anything_else re-write the host to use a subdomain admin.mydomain.com/admin/anything_else. likewise, if i click a link while in the admin.mydomain.com and it is a url WITHOUT "admin" in it, then i would like to rewrite the ...

how to use sed, awk, or gawk to print only what is matched?

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk. But in my case, I have a regular expression that I want to run against a text file to extract a specific value. I don't want to do search-and-replace. This is being called from bash. Let's use an example: Example regular express...

Help needed for writing regular expression for this complex conditions

Hi, What can be a regular expression for following type of string E.g. 1, 2-3, 4..5, <6, <=7, >8, >=9 Here I am using equals, range (-), sequence (..) & greater than/equal to operators for numbers less than 100. These numbers are separated by a comma. Pls help me in writing a regular expression for this. Thanks in advance. Atul ...

Newbie regex question - detect spam

Here's my regex newbie questions: How can I check if a string has 3 spam words? (for example: viagra, pills and shop) How can I detect also variations of those spam words like "v-iagra" or "v.iagra" ? (one additional character) ...

Preg_match Alpha numeric -_' ", and white space

I have had a few cracks at this but can't seem to get it right. Anyone have a regex to allow alphanumerics and -_",' as well as white spaces. Thx ...

Is this specific path concatenation in Perl code exploitable?

Assume that an attacker controls the variable $untrusted_user_supplied_path . Is the following Perl code exploitable? my $untrusted_user_supplied_path = ... if ($untrusted_user_supplied_path =~ /\.\./) { die("Tries to escape homedir."); } my $base_path = "/home/username/"; my $full_path = "${base_path}${untrusted_user_supplied_path}";...

Why is this regex being greedy?

I am trying to extract all links that have /thumb/ in it within ""'s. Actually i only need to use the images src. I dont know if images will end with jpg or if there will be case sensitivity problems, etc. I really only care about the full link. m = Regex.Match(page, @"""(.+?/thumbs/.+?)"""); //... var thumbUrl = m.Groups[1].Value; My...

URL regex with regex.h in c

SO: I'm having a bit of difficulty setting up a regex to match a URL using the regex.h library in c. I have a working IP regex that I was hoping to convert to match a simple string such as www.alphanumerictext12.com|edu|org. Something is wrong with my syntax in the regex definition itself. Below is the working IPREGEX code and my a...

HTML anchor replace with RegEx

I have HTML data which I'll be using in a client app. I need to Regex.Replace the <a> tags from <a href="Bahai.aspx">Bahai</a> to <a href="#" onclick="process('Bahai.aspx');return false;">Bahai</a> In C# using RegExReplace with a regex similar to <a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>(?<text>.*?)</a> Ideas? ...