regex

How to find a repeated string and the value between them using regexes?

How would you find the value of string that is repeated and the data between it using regexes? For example, take this piece of XML: <tagName>Data between the tag</tagName> What would be the correct regex to find these values? (Note that tagName could be anything). I have found a way that works that involves finding all the tagNames ...

Using map() to get number of times list elements exist in a string in Python

I'm trying to get the number of times each item in a list is in a string in Python: paragraph = "I eat bananas and a banana" def tester(x): return len(re.findall(x,paragraph)) map(tester, ['banana', 'loganberry', 'passion fruit']) Returns [2, 0, 0] What I'd like to do however is extend this so I can feed the paragraph value into t...

javascript regex for whitespace or &nbsp;

I am looking for a javascript regex for whitespace. I am checking several different string in a loop and I need to locate the strings that have the big white space in them. The white space string is built in a loop, like this... please read this code as var whitespace = "&nbsp;" then the loop just concats more non breaking spaces on it...

Regular Expression for nested tags (Wikimedia content)

Haven't done regex in awhile, and am a bit rusty. I'm trying to parse the categories out of a Wikipedia entry. What I need are the individual strings contained in a pattern that starts with two open brackets and ends with two closing brackets. This query works most of the time - (\[\[)(?<category>.*[^\]#])([\]]) but has issues wh...

Python Unicode Regular Expression Question

Hello, I am using python 2.4 and I am having some problems with unicode regular expressions. I have tried to put together a very clear and concise example of my problem. It looks as though there is some problem with how Python is recognizing the different character encodings, or a problem with my understanding. Thank you very much for ...

using sed and grep to search and replace

I am using egrep -R followed by a regular expression containing about 10 unions, so like: .jpg | .png | .gif etc... This works well. I would like to then replace all strings found with .bmp I was thinking of something like egrep -lR "\.jpg|\.png|\.gif" . | sed "s/some_expression/.jpg/" file_it_came_form so the issue here is, how do I...

Non greedy LookAhead

Hi, I have strings like follows: val:key I can capture 'val' with /^\w*/. How can I now get 'key' without the ':' sign? Thanks ...

regex to match EOF

I have some data that look like this john, dave, chris rick, sam, bob joe, milt, paul I'm using this regex to match the names /(\w.+?)(\r\n|\n|,)/ which works for the most part but the file ends abruptly after the last word meaning the last value doesn't end in \r\n, \n or , it ends with EOF. Is there a way to match EOF in regex so...

Tokenizing blocks of code in Python

I have this string: [a [a b] [c e f] d] and I want a list like this lst[0] = "a" lst[1] = "a b" lst[2] = "c e f" lst[3] = "d" My current implementation that I don't think is elegant/pythonic is two recursive functions (one splitting with '[' and the other with ']' ) but I am sure it can be done using list comprehensions or regula...

Regex to match a pattern, but exclude a set of words

I have been looking through SO and although this question has been answered in one scenario: Regex to match all words except a given list It's not quite what I'm looking for. I am trying to write a regular expression which matches any string of the form [\w]+[(], but which doesn't match the three strings "cat(", "dog(" and "sheep(" spe...

Is there a way to get the variables that were used in a RegEx.Replace to use in .NET?

For example, I have a pattern that I am searching for using the \G option so it remembers its last search. I would like to be able to reuse these in .NET c# (ie: save the matches into a collection) For Example: string pattern = @"\G<test:Some\s.*"; string id = RegEx.Match(orig, pattern).Value; // The guy above has 3 matches and i wan...

Replacing <p>, <div> tags within <td> tags?

I'm working on a specialized HTML stripper. The current stripper replaces <td> tags with tabs then <p> and <div> tags with double carriage-returns. However, when stripping code like this: <td>First Text</td><td style="background:#330000"><p style="color:#660000;text-align:center">Some Text</p></td> It (obviously) produces First Tex...

Is there a way to get the variables that were used in a RegEx.Replace to use in .NET?

For example, i have a pattern that i am searching for using the \G option so it remembers its last search. I would like to be able to reuse these in .NET c# (ie: save the matches into a collection) For Example: string pattern = @"\G<test:Some\s.*"; string id = RegEx.Match(orig, pattern).Value; // The guy above has 3 matches and i wa...

php split string using regex

Hi guys, I need to get the company name and it's ticker symbol in different arrays. Here is my data which is stored in a txt file: 3M Company MMM 99 Cents Only Stores NDN AO Smith Corporation AOS Aaron's, Inc. AAN and so on How would I do this using regex or some other techniques? Thanks ...

How do I remove all non-numbers in a string using a regular expression?

I'm trying to find all letters and dashes and dollar signs and remove them from a text box. function numbersOnly() { if ($('.sumit').val().indexOf([A-Za-z-$])) { $('.sumit').val().replace([A-Za-z-$], ""); } } That's what I've got and I'm pretty sure it's wrong. I'm not too great with regular expressions, but I'm tryin...

Manipulating huge CSV files with sed

I have a set of 4 massive CSV files that I need to modify. What I need to do is match this expression /^(.*),,/ copy the atom then prepend it to every subsequent line until the atom is matched again. Then I need to rinse and repeat until the end of the file (each file has approx 25k lines in it). Finally I need to go back through and r...

Javascript regex to detected possible credit card numbers

I'm having no end of trouble coming up with an appropriate regex or set of regex's. What I want to do is detect: Detect contineous run of digits of length 13 through 19 Detect contineous run of digits interspersed with whitespace of length 13 through 19 Detect contineous run of digits interspersed with dashes of length 13 through 19 ...

Regex that matches a newline (\n) in C#

OK, this one is driving me nuts.... I have a string that is formed thus: var newContent = string.Format("({0})\n{1}", stripped_content, reply) newContent will display like: (old text) new text I need a regular expression that strips away the text between parentheses with the parenthesis included AND the newline character. Th...

Help with regex replace in php

I have a bunch of urls in static html files which need to be changed. They look like this now: <img src="/foldera/folderb/folderc/images/imgxyz.jpg" /> They need to look like this: <img src="imgxyz.jpg" /> So, I just wrote a php script that opens each and does a preg_replace(). My regex (with the double escaped backslashes, yes...

Php: Find first img or object tag in string

function get_first_image(){ global $post, $posts; $first_img = ''; ob_start(); ob_end_clean(); $output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches) || preg_match_all('/<object[0-9 a-z_?*=\":\-\/\.#\,<>\\n\\r\\t]+<\/object>/smi', $post->post_content, $matches); ...