regex

Regex: Extracting readable (non-code) text and URLs from HTML documents

I am creating an application that will take a URL as input, retrieve the page's html content off the web and extract everything that isn't contained in a tag. In other words, the textual content of the page, as seen by the visitor to that page. That includes 'masking' out everything encapsuled in <script></script>, <style></style> and <!...

Nested regex... I'm clueless!

Hi all, I'm pretty clueless when it comes to PHP and regex but I'm trying to fix a broken plugin for my forum. I'd like to replace the following: <blockquote rel="blah">foo</blockquote> With <blockquote class="a"><div class="b">blah</div><div class="c"><p>foo</p></div></blockquote> Actually, that part is easy and I've already part...

How to replace the first occurrence of a regular expression in Python?

I want to replace just the first occurrence of a regular expression in a string. Is there a convenient way to do this? Thanks, ...

How to RegEx replace regions into a collection

I have a string that I want to extract text between comment tags, manipulate it, and replace it back. Here is what I am trying to work with: ... <!--RegionStart url="http://domain1.com"--&gt; some text here <!--RegionFinish--> ... <!--RegionStart url="http://domain2.com"--&gt; some text there <!--RegionFinish--> ... <!--RegionSt...

Transform URL into a link unless there already was a link

I know this has been talked here, but no solutions were offer to the exact problem. Please, take a look... I'm using a function to transform plain-text URLs into clickable links. This is what I have: <script type='text/javascript' language='javascript'> window.onload = autolink; function autolink(text) { var exp = /(\b(https?|ftp):\...

Simple python regex groups can't parse date

I'm trying to parse dates with regex, using groups, but python is returning empty lists. I'm not doing anything fancy, just 12/25/10 sort of stuff. I want it to reject 12/25-10 though. date = re.compile("\d{1,2}([/.-])\d{1,2}\1\d{2}") I've tried online regex libraries, but their solutions don't seem to run either. Any ideas? Sampl...

C# Regex getting words that start with !

The title says it all. How can I use a regular expression to get words that start with ! ? For example !Test. I tried this but it doesn't give any matches: @"\B\!\d+\b" Although it did work when I replaced the ! with $. Any help would be nice. ...

How to perform web scraping to find specific linked pages in Java on Google App Engine?

I need to retrieve text from a remote web site that does not provide an RSS feed. What I know is that the data I need is always on pages linked to from the main page (http://www.example.com/) with a link that contains the text " Invoices Report ". For example: <a href="http://www.example.com/data/invoices/2010/10/invoices-report---tue...

Easiest way of working with a comma separated list

I'm about to build a solution to where I receive a comma separated list every night. It's a list with around 14000 rows, and I need to go through the list and select some of the values in the list. The document I receive is built up with around 50 semicolon separated values for every "case". How the document is structured: "";"2010-10-1...

about python regular expression match from right direction

normally,we use regular expression match from left to right direction,i want to know is there some switch can match from the right to left in python? or in any other language has this feature embedded e.g. abcd1_abcd2 if give a abcd regular expression,it will match two abcd,what i want is put the last match at first in a reverse direc...

Regex to retrieve string

I wish to match &amp;v= and before "> is there a regex match i could use Example: <a accesskey="1" href="/watch?gl=GB&amp;client=mv-google&amp;hl=en-GB&amp;v=ubNF9QNEQLA">Test Your Awareness : Whodunnit?</a> i only need the ubNF9QNEQLA Thanks ...

Parsing a file of values in order to change into an SQL insert

Hey, trying to figure out a way to use a file I have to generate an SQL insert to a database. The file has many entries of the form: 100090 100090 bill smith 1998 That is,an id number, another id(not always the same), a full name and a year. These are all separated by a space. Basically what i want to to is be able to get variables f...

recursively downloading files from webpage..

http://examples.oreilly.com/9780735615366/ I actually want to be able to have all these files in my disk. as u can see there are many folders each with different type of files. and u cannot download "the-folder" directly...only individual files ~ is there any way to automate process..? I will need regular expressions on urls to arr...

PHP HTML \t and Tab Troubles

I have started making a syntax highlighter in php (only a quick one) and so far I have got a code box generator (aka it creates a table with styles that looks good and can display source code and html code). At the moment when writing code with it I do this: $code = "def example_ruby_code(does_it_work) " . "(insert tab here) @does_i...

regexkitlite match ?

Hi, i would like to get everything inbetween &amp;v= and "> using a regex expression, NSString *YouTubeRegex = @"/amp;v=([^(\">)]+)/"; But that regex is not returning any matches ? i know the coding is correct just not the regex expression any help ? Thanks ...

JS Regex, how to replace the captured groups only ?

Ok the question is quite simple. I'm looking for a string like this one : name="some_text_0_some_text" I have HTML code before and after the string above. Now i would like to replace the 0 by something like : !NEW_ID! So i made a simple regex : .*name="\w+(\d+)\w+".* But i don't see how to replace exclusively the captured block....

AND operator in regular expression?

Given a string, how do I express that a pattern must match multiple regex using the AND operator? For example, I want a password to be a minimum of 5 characters AND maximum of 12 characters. Regex for mimimum 5 characters: .{5,} Regex for maximum 12 characters: .{12} I know I can combine the above two to something like this: .{5,12},...

how to match a specific integer pattern in PHP using reg expressions?

Hello everyone, I am looking for a reg exp to match 1111111/11 pattern, all numbers are integers. I will be grateful if anyone can please help? I am not that good in regular expressions. ...

Help with regular expression

Hello, Apart from what they are, I dont know anything about regular expressions... :( I have this code in a javascript function: var foroFormatting = function (text) { var newText = text; var findreps = [ { find: /^([^\-]+) \- /g, rep: '<span class="ui-selectmenu-item-header">$1</span>' }, { find: /([^\|><]+) \...

regex: attach a newline to every sentence using vim

i was wondering how to turn a paragraph, into bullet sentences in vim. before: sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7. after: sentence1. sentence2. sentence3 sentence4. sentence5. ...