questions about pcre | ansaurus

pcre

preg match email and name from to

i want to find name and email from following formats (also if you know any other format that been getting use in mail application for sending emails, please tell in comment :)) how can i know name and email for following format strings (its one string and can be in any following format): - [email protected] - james [email protected]...

Get position of all matches in group

Hi Consider the following example: $target = 'Xa,a,aX'; $pattern = '/X((a),?)*X/'; $matches = array(); preg_match_all($pattern,$target,$matches,PREG_OFFSET_CAPTURE|PREG_PATTERN_ORDER); var_dump($matches); What it does is returning only the last 'a' in the series, but what I need is all the 'a's. Particularly, I need the position of ...

grouping in front of positive lookbehind not matching

Hi Take the following code: $target = 'NAME FUNC LPAREN P COMMA P COMMA P RPAREN'; //$target = 'NAME FUNC LPAREN P RPAREN'; //$target = 'NAME FUNC LPAREN RPAREN'; $pattern = '/(?P<ruleName>NAME )?(?P<funcName>FUNC )?(?:(?<=LPAREN)(?: (?P<arg1>P))|(?P<args>P)(?=(?: RPAREN)|(?: COMMA)))/'; preg_match_all($pattern,$target,$matches,PREG_O...

preg_grep on a larger string

Hi! I need to use preg_reg on some string with content from a html file. $file = file_get_contents($path); $html_array = explode(' ', $file); The problem is that the array looks sometimes like this: [77]=> string(35) "<div> </div> <br> {{testto}} <br>" I have tried to put in some whitespaces there.. :P Won't work.. :/ L...

Converting an eregi_replace to a preg_replace

I am trying to parse some HTML snippets and want to clean them up for various reasons (XSS et al). I am currently trying to remove all of the attributes on any tag, except for the href on a anchor. I am doing this using a sequence of eregi_replace calls, but I am sure there is a smarter way of doing this using preg_replace and just a c...

Extract 2 sets of numbers from a string using PHP's preg?

Here's some PHP code: $myText = 'ABC #12345 (2009) XYZ'; $myNum1 = null; $myNum2 = null; How do I add the first set of numbers from $myText after the # in to $myNum1 and the second numbers from $myText that are in between the () in to $myNum2. How would I do that? ...

How do I extract words from a comma-delimited string in Perl?

Hi. I have a line: $myline = 'ca,cb,cc,cd,ce'; I need to match ca into $1, cb into $2, etc.. Unfortunately $myline =~ /(?:(\w+),?)+/; doesn't work. With pcretest it only matches 'ce' into $1. How to do it right? Do I need to put it into the while loop? Thanks! ...

preg_replace to remove empty tags but keep the end of blockquotes

Hi, I made this expression to remove all empty (inluding tags with just whitespace) tags in the page. $content = preg_replace('/<[^\/>]*>([\s]?)*<\/[^>]*>/', '', $content); It worked a treat until it had to deal with content like this... <blockquote> <p >foo bar</p> </blockquote> <p ><a href="image.jpg" rel="lightbox" title=""><im...

Regular expression libraries for Mac OS X 10.6

Is there a library compatible with PCRE that can be used on Mac OS X 10.6, and which is Unicode compatible? I was thinking to use the predicates, but it is a little excessive when the application is not already using Spotlight predicates. ...

[A-Z]{2,4} not limiting to between 2 & 4 characters

PCRE: /\A[A-Z0-9_\.%\+\-]+@(?:[A-Z0-9\-]+\.)+(?:[a-z]{2,4}|museum|travel)\z/i POSIX: /^[A-Z0-9_\.%\+\-]+@(?:[A-Z0-9\-]+\.)+(?:[A-Z]{2,4}|museum|travel)$/i This regex is correct in every way for my needs except that it allows emails such as [email protected]. It says these are a match. If I'm not mistaken, doesn't the {2,4} after [A-Z] mean th...

Will [a-z] ever match accented characters in PREG/PCRE?

I'm already aware that \w in PCRE (particularly PHP's implementation) can sometimes match some non-ASCII characters depending on the locale of the system, but what about [a-z]? I wouldn't think so, but I noticed these lines in one of Drupal's core files (includes/theme.inc, simplified): // To avoid illegal characters in the class, // w...

PHP PREG Question

As hard as I try, PREG and I don't get along, so, I am hoping one of you PHP gurus can help out .. I have some HTML source code coming in to a PHP script, and I need specific items stripped out/removed from the source code. First, if this comes in as part of HTML (could be multiple instances): <SPAN class=placeholder title="" jQuery12...

put <em> and </em> in the beginning and end of each found keywords

im new to preg and wants to find some strings i got in an array and emphasize each one. eg. array[0] = "windows"; array[0] = "windows xp"; text will be: <em>windows</em> is bla bla...<em>windows xp</em> is bla bla how could i do that? ...

PHP PREG Regex: What does "\W" mean when using the UTF-8 modifier?

I know that in normal php regex (ASCII mode) "\w" (word) means "letter, number, and _". But what does it mean when you are using multibyte regex with the "u" modifier? preg_replace('/\W/u', '', $string); ...

RegEx Backreferences

Having the following regular expression: ([a-z])([0-9])\1 It matches a5a, is there any way for it to also match a5b, a5c, a5d and so on? EDIT: Okay, I understand that I could just use ([a-z])([0-9])([a-z]) but I've a very long and complicated regular expression (matching sub-sub-sub-...-domains or matching an IPv4 address) that wou...

RegEx: \w - "_" + "-" in UTF-8

I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn't match underscores (_), I tried these silly attempts without success: ([\w-^_])+ ([\w^_]-?)+ (\w[^_]-?)+ The \w is shorthand for [A-Za-z0-9_], but it also matches UTF-8 chars if I have the u modifier set. Can anyone help me out with this ...

Regex - Unicode Properties Reference and Examples

I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ^ for instance. Is there any documentation / reference with examples on regular expressions Unicode properties? ...

PCRE regex to sed regex

First of all sorry for my bad english. I'm a german guy. The code given below is working fine in PHP: $string = preg_replace('/href="(.*?)(\.|\,)"/i','href="$1"',$string); Now T need the same for sed. I thought it should be: sed 's/href="(.*?)(\.|\,)"/href="{$\1}"/g' test.htm But that gives me this error: sed: -e expression #1...

PHP PCRE (regex) isn't doesn't support UTF-8?

I am attempting to run a regex on my site, and I am getting this response: Compilation failed: support for \P, \p, and \X has not been compiled at offset 1 After googling for a bit, I've found that apparently my PCRE on my server is not UTF8 enabled, and is therefore causing problems. When I ssh with pcretest -C I get PCRE ver...

PHP: PREG: How to match special chars like a grave?

Hi, I'd like to give my users the option to not only fill in letters and numbers, but also "special" letters like the "á", "é" etc. Though I do not want them to be able to use symbols like "!", "@", "%" etc. Is there a way to write a regex to accomplish this? (preferably without specifying each special letter) Now I have; $reg = '/^[...

1
2
3
4
5
...
10