regex

Unicode Regex; Invalid XML characters

The list of valid XML characters is well known, as defined by the spec it's: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] My question is whether or not it's possible to make a PCRE regular expression for this (or its inverse) without actually hard-coding the codepoints, by using Unicode general categories. An...

Why does \w match only English words in javascript regex?

I'm trying to find URLs in some text, using javascript code. The problem is, the regular expression I'm using uses \w to match letters and digits inside the URL, but it doesn't match non-english characters (in my case - Hebrew letters). So what can I use instead of \w to match all letters in all languages? ...

Split by \b when your regex engine doesn't support it

How can I split by word boundary in a regex engine that doesn't support it? python's re can match on \b but doesn't seem to support splitting on it. I seem to recall dealing with other regex engines that had the same limitation. example input: "hello, foo" expected output: ['hello', ', ', 'foo'] actual python output: >>> re.comp...

Regular expressions and escaping special characters

I am tired of always trying to guess, if I should escape special characters like '()[]{}|' etc. when using many implementations of regexps. It is different with, for example, Python, sed, grep, awk, Perl, rename, Apache, find and so on. Is there any rule set which tells when I should, and when I should not, escape special characters? Do...

Can I improve this regex check for valid domain names?

So, I have been working on this domain name regular expression. So far, it seems to pick up domain names with SLDs and TLDs (with the optional ccTLD), but there is duplication of the TLD listing. Can this be refactored any further? params[:domain_name].downcase.strip.match(/^[a-z0-9\-]{2,63} \.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdef...

RegEx Whitespace Vs. Eclipse

Hello, I´m trying to make a regular expression to match a whitespace and so far I´m doing this: Powered[\s]*[bB]y.*MyBB I know it should work because I've tried it with Regex Buddy and it says it does but when I try to run it with Eclipse it marks an error saying it's not a valid escape sequence and it automatically adds 2 ´\´ renderi...

Matching version number parts with regular expressions

I'm trying to match the parts of a version number (Major.Minor.Build.Revision) with C# regular expressions. However, I'm pretty new to writing Regex and even using Expresso is proving to be a little difficult. Right now, I have this: (?<Major>\d*)\.(?<Minor>\d*)\.(?<Build>\d*)\.(?<Revision>\d*) This works, but requires that every part...

Textmate Regex Find Replace Help

Hi, I've got a project I'm working on converting some legacy perl cgi forms to PHP. A lot of this requires finding / replacing information. In one such case, I have lines like this in the perl script: <INPUT type="radio" name="trade" value="1" $checked{trade}->{1}> which needs to read: <INPUT type="radio" name="trade" value="1" ...

How can I get the correct text definition of a generic type using reflection?

I am working on code generation and ran into a snag with generics. Here is a "simplified" version of what is causing me issues. Dictionary<string, DateTime> dictionary = new Dictionary<string, DateTime>(); string text = dictionary.GetType().FullName; MessageBox.Show(text); With the above code snippet the value for "text" is as follows...

Regex that only matches text that's not part of HTML markup? (python)

How can I make a pattern match so long as it's not inside of an HTML tag? Here's my attempt below. Anyone have a better/different approach? import re inputstr = 'mary had a <b class="foo"> little loomb</b>' rx = re.compile('[aob]') repl = 'x' outputstr = '' i = 0 for astr in re.compile(r'(<[^>]*>)').split(inputstr): i = 1 - i ...

How to capture only part of an id?

I'm trying to capture the id of an element that will be randomly generated. I can successfully capture the value of my element id like this... | storeAttribute | //div[1]@id | variableName | Now my variable will be something like... divElement-12345 I want to remove 'divElement-' so that the variable I am left with is '12345' so th...

How can I read a custom defined pattern from a file in Perl?

Hi, Advance New Year Wishes to All. I have an error log file with the contents in a pattern parameter, result and stderr (stderr can be in multiple lines). $cat error_log <parameter>:test_tot_count <result>:1 <stderr>:Expected "test_tot_count=2" and the actual value is 3 test_tot_count = 3 <parameter>:test_one_count <result>:0 <stder...

How do regular expressions work in selenium?

I want to store part of an id, and throw out the rest. For example, I have an html element with an id of 'element-12345'. I want to throw out 'element-' and keep '12345'. How can I accomplish this? I can capture and echo the value, like this: | storeAttribute | //pathToMyElement@id | myId | | echo | ${!-myId-!} | | When I run the test...

Regular expressions question in .NET ...

In my ASP.NET application, I want to use regular expressions to change URLs into hyper links in user posts, for example: http://www.somesite.com/default.aspx to <a href="http://www.somesite.com/default.aspx"&gt;http://www.somesite.com/default.aspx&lt;/a&gt; This is fairly easy using Regex.Replace(), but the problem I'm having is th...

What have you used Regular Expressions for?

I have heard of regular expressions and only seen use cases for a few things so I don't think of using them very often. In the past I have done a couple of things and it has taken me hours to do. Later I talk to someone and they say "here is how to do it using a regular expression". So what are things for which you have used Regular E...

Java: how to check if character belongs to a specific unicode block?

I need to identify what character set my input belongs to. The goal is to distinguish between Arabic and English words in a mixed input (the input is unicode and is extracted from XML text nodes). I have noticed class Character.UnicodeBlock : is it related to my problem? How can I get it to work? Edit: The Character.UnicodeBlock ...

Can I use a Regex in an XPath expression?

Something like ".//div[@id='foo\d+]" to capture div tags with id='foo123'. I'm using .NET, if that matters. ...

Regular expression to match string not containing a word?

I know it is possible to match for the word and using tools options reverse the match. (eg. by grep -v) However I want to know if it is possible using regular expressions to match lines which does not contain a specific word, say hede? Input: Hoho Hihi Haha hede # grep "Regex for do not contain hede" Input Output: Hoho Hihi Haha ...

javascript equivalent of php's preg_replace

I am using a simple regex to replace break tags with newlines: br_regex = /<br>/; input_content = input_content.replace(br_regex, "\n"); This only replaces the first instance of a break tag, but I need to replace all. preg_match_all() would do the trick in php, but I'd like to know the javascript equivalent. Thanks! ...

Regex to add padding zeroes

I am working on this yahoo pipe Regex and I found a bug I'm unable to wrap my mind around it. I have a URL, from which I extract digits, cat them and make a img html tag and embed it. The issue is that, the URL is presented in a non padded way, but the image linked has the zeroes. Therefore, when there is a day or a month with single di...