regex

What does this Regular Expression do

$pee = preg_replace( '|<p>|', "$1<p>", $pee ); This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help? Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with reg...

RegEx for extracting HTML Image properties

I need a RegEx pattern for extracting all the properties of an image tag. As we all know, there are lots of malformed HTML out there, so the pattern has to cover those possibilities. I was looking at this solution http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php but it didn't quite get ...

How do I replace a set of words in a file with another set in Perl?

My requirement is to replace a set of words in a given text file with a second set of words, which might be given from the command line or another file. Wanting to use Perl to do this, as the rest of my code is also in Perl. So, if I have the following: server name="${server1}" host="abc.com" server name="${server2}" host="webcs.com" s...

Using Lookahead to match a string using a regular expression

I need to match a string holiding html using a regex to pull out all the nested spans, I assume I assume there is a way to do this using a regex but have had no success all morning. So for a sample input string of <DIV id=c445c9c2-a02e-4cec-b254-c134adfa4192 style="BORDER-RIGHT: #000000 1px solid; BORDER-TOP: #000000 1px solid; BORDE...

using regex to find files with certain extensions

What's the regular expression I could use with find -regex to find all files that have a .xls or .csv extension? ...

RegEx to get text within tags

I need a Regular Expressions to get the text within 2 tags. Lets say I want an array returned containing any text within <data> and </data> tags. Or any text within "(" and ")" tags. How can I do that with RegEx's in C#? An advanced question would be: The input string is "color=rgb(50,20,30)" How can I get the 3 numbers in 3 sepe...

What is "The Best" U.S. Currency RegEx?

A quick search for currency regex brings up a lot of results. MSDN uses ^-?\d+(\.\d{2})?$ The problem I have in choosing one of these is that regex is difficult to verify without testing all the edge cases. I could spend a lot of time on this as I am sure hundreds of other developers have already done. So ... Does anyone have a regex...

Negating literal strings in a Java regular expression

So regular expressions seem to match on the longest possible match. For instance: public static void main(String[] args) { String s = "ClarkRalphKentGuyGreenGardnerClarkSupermanKent"; Pattern p = Pattern.compile("Clark.*Kent", Pattern.CASE_INSENSITIVE); Matcher myMatcher = p.matcher(s); int i = 1; while (myMatcher.find()) { Syst...

Regex: replace all characters after 15th with '...'

I am trying to do some simple formatting stuff with 'sed' in linux, and i need to use a regex to trim a string after the 15th character, and append a '...' to the end. Something like this: before: this is a long string that needs to be shortened after: this is a long ... Can anyone please show me how i could write this as a regex, and...

Need to test for a "\\" (backslash) in this Reg Ex

Currently I use this reg ex: "\bI([ ]{1,2})([a-zA-Z]|\d){2,13}\b" It was just brought to my attention that the text that I use this against could contain a "\" (backslash). How do I add this to the expression? Thanks ...

Regex to Find Second Char is Alpha Followed by 1 numeral

Regex to Find Second Char is Alpha up to 5 Alpha Followed by 1 numeral. Thanks ...

Regular Expression to match an ssh connection string

I'm trying in vain to write a regular expression to match valid ssh connection strings. I really only need to recognise strings of the format: user@hostname:/some/path but it would be nice to also match an implicit home directory: user@hostname: I've so-far come up with this regex: /^[:alnum:]+\@\:(\/[:alnum:]+)*$/ which does...

Regular Expression to Extract HTML Body Content

I've been playing around with RegExBuddy for over an hour trying to figure out what I thought would be a trivial RegEx. I am looking for a RegEx statement that will let me extract the HTML content from just between the body tags from a XHTML document. The XHTML that I need to parse will be very simple files, I do not have to worry about...

C# Extracting a name from a string

I want to extract 'James\, Brown' from the string below but I don't always know what the name will be. The comma is causing me some difficuly so what would you suggest to extract James\, Brown? OU=James\, Brown,OU=Test,DC=Internal,DC=Net Thanks ...

Python regex findall numbers and dots

Hi, I'm using re.findall() to extract some version numbers from an HTML file: >>> import re >>> text = "<table><td><a href=\"url\">Test0.2.1.zip</a></td><td>Test0.2.1</td></table> Test0.2.1" >>> re.findall("Test([\.0-9]*)", text) ['0.2.1.', '0.2.1', '0.2.1'] but I would like to only get the ones that do not end in a dot. The filename ...

Replacing a word with another word zero or more times in javascript using regular expressions

I have a word : [lesserthen] , that I need to replace with < zero or more times in a string. I'm using the String.replace method to do so. But the word is only replaced one time in the string, I need to be replaced more then one time. I'm very weak with regular expressions and I'm interested to find a solution for this problem. Here i...

When is it best to use Regular Expressions over basic string spliting / substring'ing?

It seems that the choice to use string parsing vs. regular expressions comes up on a regular basis for me anytime a situation arises that I need part of a string, information about said string, etc. The reason that this comes up is that we're evaluating a soap header's action, after it has been parsed into something manageable via the O...

How do I do a regexp search against a Mandarin string?

How do I do a regexp search against a Mandarin string? ...

Using a Regex to find specifically formatted fields

In an application I'm developing, someone thought it would be ok idea to include commas in the values of a csv file. So what I'm trying to do select these values and then strip the commas out of these values. But the Regex I've built won't match to anything. The Regex pattern is: .*,\"<money>(.+,.+)\",?.* And the sort of values I'm ...

How would I dynamically add a new XML node based on the values of other nodes?

Background: I have an old web CMS that stored content in XML files, one XML file per page. I am in the process of importing content from that CMS into a new one, and I know I'm going to need to massage the existing XML in order for the import process to work properly. Existing XML: <page> <audience1>true</audience> <audience2>f...