regex

What's wrong with this regex to validate URLs in Ruby?

I am passing an array of URLs to validate. The function is below. It works when I pass just one URL, but not more than one. the regular expression seems to be correct. Where am I going wrong? def check_urls (list) regexp =/(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix list.eac...

How to write correct Regex for url's on the page without anchors?

Hi all, I want to cut all url's like (http://....) and replace them on anchors <a></a> but my requirement: Do not touch anchors and page definition(Doc type) like: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt; So I need to find just plain text with url's... I'm tryin...

Is it possible to change emacs' regexp syntax?

I love emacs. I love regexs. I hate emacs' regex syntax - the need to escape grouping parens and braces, that you dont escape literal parens, the lack of predefined character classes, etc. Can I replace emacs' regex engine, or tweak some setting, so that when I use the Query-replace-regexp (or one of many other) feature I can use the s...

Why is this legacy code using cat on a filename in a call to open()?

I ran across a very strange line of code in a legacy Perl application. The code here is part of a homegrown RSS reader that does some caching to prevent being blacklisted. open(CAT, "/usr/bin/cat -v /tmp/cat-cache 2>&1|"); Does it seem likely that the original author ran the results through cat -v to strip out non-printing characters...

Regexp to pull input tags out of form

I am trying to extract all the <input > tags out of a <form> tag. I have created a regexp which can identify the entire <form> tag and all the code up to the ending </form> but I cannot figure out how to match all the <input[^>]+> within that. EDIT: The data is a string. I cannot use DOM functions because it's not part of the document. ...

Java: Search in HashMap keys based on regex?

I'm building a thesaurus using a HashMap to store the synonyms. I'm trying to search through the words based on a regular expression: the method will have to take a string as parameter and return an array of results. Here's my first stab at it: public ArrayList<String> searchDefinition(String regex) { ArrayList<String> results = new ...

How to make regular expression for Dreamwaver find and Replace?

I have the following code in about 300 HTML files, I need to replace it with some other code. But the problem in following code is the ID click=12FA863 is change and different in each file, I want to use the regular expression which will work in Find and replace in Dreamwaver. <iframe src="http://example.net/?click=12FA863" width=1 heig...

Writing a REGEX to match the src, height and width attributes of an img tag

Hi all I'm trying to write a regex expression to match the src, width and height attributes on an image tag. The width and height are optional. I have came up with the following: (?:<img.*)(?<=src=")(?<src>([\w\s://?=&.]*)?)?(?:.*)(?<height>(?<=height=")\d*)?(?:.*)(?<width>(?<=width=")(\d*)?)? expresso shows this matching only the s...

Single perl regex for removing escaped ampersands from inside href attributes but not elsewhere

This is more a puzzle question for my curiosity than anything else. I'm looking for a single regular expression substitution that will convert entity escaped ampersands to an unescaped ampersands only within href attributes in an html file. For example: <a href="http://example.com/index.html?foo=bar&amp;amp;baz=qux&amp;amp;frotz=frobnit...

Should I implement the mixed use of BeautifulSoup and REGEXs or rely solely on BS

I have some data I need to extract from a collection of html files. I am not sure if the data resides in a div element, a table element or a combined element (where the div tag is an element of a table. I have seen all three cases. My files are large-as big as 2 mb and I have tens of thousands of them. So far I have looked at the td ...

Regular expression works in VB but not C#

I have the following regular expression for validating a file name: ^(([a-zA-Z]:|\)\)?(((.)|(..)|(. %5D">^\/:*\?"\|<>. |([^\/:*\?"\|<>][^\/:*\?"\|<>. ]))?))\). %5D">^\/:*\?"\|<>. |([^\/:*\?"\|<>]*[^\/:*\?"\|<>. ]))?$ I can get it to work in VB.NET but not C#. I can't figure out why it works in one but not the other. VB code: Regex.Ma...

Parse the number with Regex with non capturing group.

Hi, I'm trying to parse phone number with regex. Exactly I want to get a string with phone number in it using function like this: string phoneRegex = @"^([+]|00)(\d{2,12}(?:\s*-*)){1,5}$"; string formated = Regex.Match(e.Value.ToString(), phoneRegex).Value; As you can see I'm trying to use non-capturing group (?:\s*-*) but I'm doing ...

Excluding a directory with ISAPI-Rewrite

I am trying to exclude a directory with ISAPI-Rewrite (note: this is a windows/iis port of mod-rewrite). The directory I want to exclude is "api" when it is at the root of the site. Here is my rule: RewriteRule ^(/api/)(.+)$ $1$2 [NC, L] A request would look something like this: /api/v2/users?usernames=scottw Unfortunately, the quer...

glibc regexp performance

Anyone has experience measuring glibc regexp functions? Are there any generic tests I need to run to make such a measurements (in addition to testing the exact patterns I intend to search)? Thanks. ...

Regex for finding date in Apache access log

I'm writing a python script to extract data out of our 2GB Apache access log. Here's one line from the log. 81.52.143.15 - - [01/Apr/2008:00:07:20 -0600] "GET /robots.txt HTTP/1.1" 200 29 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (http://www.voila.com/)" I'm trying to get the date portion from that...

Why would match find a result while test returns false for a regular expression in JavaScript?

I was trying to debug a sorting issue with the jquery plugin tablesorter which uses the following code to check for digits: this.isDigit = function(s,config) { var DECIMAL = '\\' + config.decimal; var exp = '/(^[+]?0(' + DECIMAL +'0+)?$)|(^([-+]?[1-9][0-9]*)$)|(^([-+]?((0?|[1-9][0-9]*)' + DECIMAL +'(0*[1-9][0-9]*)))$)|(^[-...

Optional characters in a regex

The task is pretty simple, but I've not been able to come up with a good solution yet: a string can contain numbers, dashes and pluses, or only numbers. ^[0-9+-]+$ does most of what I need, except when a user enters garbage like "+-+--+" I've not had luck with regular lookahead, since the dashes and pluses could potentially be anywhe...

RegEx in PHP: Matching a pattern outside of non-escaped quotes

I'm writing a method to lift certain data out of an SQL query string, and I need to regex match any word inside of curly braces ONLY when it appears outside of single-quotes. I also need it to factor in the possibility of escaped (preceded by backslash) quotes, as well as escaped backslashes. In the following examples, I need the regex...

Can mod_rewrite convert any number of parameters with any names?

I'm a total n00b at mod_rewrite and what I'm trying to do sounds simple: instead of having domain.com/script.php?a=1&b=2&c=3 I would like to have: domain.com/script|a:1;b:2;c:3 The problem is that my script takes a large number of parameters in a variety of combinations, and order is unimportant, so coding each one in the expression a...

How can I replace a specific word in C#?

Consider the following example. string s = "The man is old. Them is not bad."; If I use s = s.Replace("The", "@@"); Then it returns "@@ man is old. @@m is not bad." But I want the output to be "@@ man is old. Them is not bad." How can I do this? ...