regex

What is the most brilliant regex you've ever used?

I'm constantly amazed by the power of the regex. What I'm looking for here is: Regexs that are more cleverly badass than ridiculously badass Regex replacements are acceptable as well if you've had some cool usage of them Refactored code to use a regex and make it more efficient Refactored a large regex with a smaller one Humorous rege...

Complexity of Regex substitution

I didn't get the answer to this anywhere. What is the runtime complexity of a Regex match and substitution? Edit: I work in python. But would like to know in general about most popular languages/tools (java, perl, sed). ...

Using Regex to generate Strings rather than match them

I am writing a Java utility which helps me to generate loads of data for performance testing. It would be really cool to be able to specify a regex for Strings so that my generator spits out things which match this. Is there something out there already baked which I can use to do this? Or is there a library which gets me most of the w...

My regex is matching too much. How do I make it stop?

J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM J0000010: Project name: E:\foo.pf J0000011: Job name: MBiek Direct Mail Test J0000100: Machine name: DEV J0000100: Project file: E:\mbiek\foo.pf J0000100: Template file: E:\mbiek\foot.xdt J0000100: Job name: MBiek J0000100: Output folder: E:\foo\A0001401 J0000100: Tem...

Passing a commented, multi-line (freespace) regex to preg_match

I have a regex that is going to end up being a bit long and it'd make it much easier to read to have it across multiple lines. I tried this but it just barfs. preg_match(' ^J[0-9]{7}:\s+ (.*?) #Extract the Transaction Start Date msg \s+J[0-9]{7}:\s+Project\sname:\s+ ...

How do I perform a Perl substitution on a string while keeping the original?

In Perl, what is a good way to perform a replacement on a string using a regular expression and store the value in a different variable, without changing the original? I usually just copy the string to a new variable then bind it to the s/// regex that does the replacement on the new string, but I was wondering if there is a better way ...

Summary of differences in Regular Expression syntax for various tools and languages?

I can never remember the differences in regular expression syntax used by tools like grep and awk, or languages like Python and PHP. Generally, Perl has the most expansive syntax, but I'm often hamstrung by the limitations of even egrep ("extended" grep). Anyone know of a site that lists the differences in a concise and easy-to-read fas...

Best regex to catch XSS (Cross-site Scripting) attack (in Java)?

Jeff actually posted about this in Sanitize HTML. But his example is in C# and I'm actually more interested in a Java version. Does anyone has a better version for Java? Does his example is good enough that I could just convert it directly from C# to Java? [Update] I have put a bounty on this question because SO wasn't as popular as to...

Combining values from different files into one CSV file

I have a couple of files containing a value in each line EDIT: Oops... I figured out the answer to this question while in the midst of writing the post and didn't realize I had posted it by mistake in its incomplete state. I was trying to do: paste -d ',' file1 file2 file 3 file 4 > file5.csv and was getting a weird output. I later ...

How do you include a webpage title as part of a webpage URL?

What is a good complete Regex or some other process that would take "How do you change a title to be part of the url like Stackoverflow?" and turn it into "how-do-you-change-a-title-to-be-part-of-the-url-like-stackoverflow" that is used in the smart urls? The dev environment is I am using is Rails but if there are some other platform sp...

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression: H3Y5NC8E-TGA5B6SB-2NVAQ4E0 and return the following using Split: H3Y5NC8E TGA5B6SB 2NVAQ4E0 I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not a...

Regex to Parse Hyperlinks and Descriptions

C#: What is a good Regex to parse hyperlinks and their description? Please consider case insensitivity, whitespace and use of single quotes (instead of double quotes) around the HREF tag. Please also consider obtaining hyperlinks which have other tags within the <a> tags such as <b> and <i> ...

What's the fastest way to determine a full URL from a relative URL (given a base URL).

I'm currently using URI::URL to generate this, however it is isn't as fast as I'd like it to be. Does anyone know another way to do this that may be faster? ...

Getting parts of a URL (Regex)

Given the URL (single line): http://test.example.com/dir/subdir/file.html How can I extract the following parts using regular expressions: The Subdomain (test) The Domain (example.com) The path without the file (/dir/subdir/) The file (file.html) The path with the file (/dir/subdir/file.html) The URL without the path (http://test.exam...

Finding a DOI in a document or page

The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful for citation information, etc. Is there a reliable way to identify a DOI in a block of text without assuming the 'doi:' prefix? (any language acceptable, regexes...

Bash reg-exp substitution

Is there a way to run a regexp-string replace on the current line in the bash? I find myself rather often in the situation, where I have typed a long commandline and then realize, that I would like to change a word somewhere in the line. My current approach is to finish the line, press Ctrl-A (to get to the start of the line), insert a...

What is the difference between a group and match in .NET's RegEx?

What is the difference between a group and match in .NET's RegEx? ...

Regex to match all HTML tags except <p> and </p>

I need to match and remove all tags using a regular expression in Perl. I have the following: <\\??(?!p).+?> But this still matches with the closing </p> tag. Any hint on how to match with the closing tag as well? Note, this is being performed on xhtml. ...

Features common to all regex flavors?

I've seen a lot of commonality in regex capabilities of different regex-enabled tools/languages (e.g. perl, sed, java, vim, etc), but I've also many differences. Is there a standard subset of regex capabilities that all regex-enabled tools/languages will support? How do regex capabilities vary between tools/languages? ...

PHP and JavaScript regex

After my webform is submitted, regex will be applied to user input on the server side (via PHP). I'd like to have the identical regex running in real-time on the client side to show the user what the real input will be. This will be pretty much the same as the Preview section on the Ask Question pages on StackOverflow except with PHP on ...