pcre

Test/expand my email regex

I'm really not confident with Regex, I know some basic syntax but not enough to keep me happy. I'm trying to build a regular expression to check if an email is valid. So far here's what I've got: [A-Za-z0-9._-]+@[A-Za-z0-9]+.[A-Za-z.]+ It needs to take account of periods in the username/domain and I think it works with multiple TLDs ...

Storing PCRE compiled regexes in C/C++

Is there an efficient way to store the compiled regexes (compiled via regcomp(), PCRE) in a binary file, so that later I can just read from the file and call regexec()? Or is it just a matter of dumping the compiled regex_t structs to the file and reading them back when needed? ...

Efficiently querying one string against multiple regexes.

Lets say that I have 10,000 regexes and one string and I want to find out if the string matches any of them and get all the matches. The trivial way to do it would be to just query the string one by one against all regexes. Is there a faster,more efficient way to do it? EDIT: I have tried substituting it with DFA's (lex) The problem he...

Matching a time string with a regular expression

I would like to match the time (10.00) from a string with the date and time ("21.01.08 10.00"). I'm using the following regular expression: new RegExp("\\b[0-9]{1,2}\\.[0-9]{1,2}\\b" "g"); But this matches 21.01 from 21.01.08 and 10.00. I'm using PCRE as my regualar expression engine. Update: I'm sorry, i should have more been mor...

regex (in PHP) to match & that aren't HTML entities

Here's the goal: to replace all standalone ampersands with & but NOT replace those that are already part of an HTML entity such as  . I think I need a regular expression for PHP (preferably for preg_ functions) that will match only standalone ampersands. I just don't know how to do that with preg_replace. ...

Regular expression : Match anything but full token

I have the following snippet where I would like to extract code between the {foreach} and {/foreach} using a regular expression: {foreach (...)} Some random HTML content <div class="">aklakdls</div> and some {$/r/template} markup inside. {/foreach} I already have: {foreach [^}]*} but I am unable to match anything after that. Is the...

Buffer overrun in 1 line ! (uses the PCRE library)

Well, if someone can help that would be nice...cause I'm almost going nuts ! I only have 1 line of code, and this is : pcrecpp::RE re("abc"); inside a function OnBnClickedButtonGo(). And this function fails in the Release Mode ..! ..while it works OK in debug mode. (I use VS 8 on WinXP) The error message is : "A buffer overrun has oc...

preg_match_all() [function.preg-match-all]: Unknown modifier ']'

Using a few different patterns but they each come up with this error - so what's wrong? My shortest one to diagnose is: $pattern = "<img([^>]*[^/])>"; preg_match_all($pattern, $subject, $matches); Thanks ...

Unicode Regex; Invalid XML characters

The list of valid XML characters is well known, as defined by the spec it's: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] My question is whether or not it's possible to make a PCRE regular expression for this (or its inverse) without actually hard-coding the codepoints, by using Unicode general categories. An...

preg_match works in regexbuddy, not in php

Ok so I have this regex that I created and it works fine in RegexBuddy but not when I load it into php. Below is an example of it. Using RegexBuddy I can get it to works with this: \[code\](.*)\[/code\] And checking the dot matches newline, I added the case insensitive, but it works that way as well. Here is the php: $q = "[code]<d...

Need a regex to match a variable length string of numbers that can't be all zeros

I need to validate an input on a form. I'm expecting the input to be a number between 1 to 19 digits. The input can also start with zeros. However, I want to validate that they are not all zeros. I've got a regex that will ensure that the input is numeric and between 1 and 19 numbers. ^\d[1,19]$ But I can't figure out how to include a...

Find out subdomain using Regular Expression in PHP

Hi guys, Sorry if this is too little of a challenge to be suited as a stack overflow question, but I'm kind of new to Regular Expressions. My question is, what is the regular expression that returns the string "token" for all the examples bellow? token.domain.com token.domain.com/ token.domain.com/index.php token.domain.com/folder/...

Using a regular expression to match each individual character as it's own group?

In PHP I'm trying to match each character as its own group. Which would mimic the str_split(). I tried: $string = '123abc456def'; preg_match_all('/(.)*/', $string, $array); // $array = array(2) { // [0]=> array(2) { // [0]=> string(12) "123abc456def" // [1]=> string(0) "" } // [1]=> array(2) { [0]=> string(1) "f" [...

Extracting functions arguments using RegExp (PREG)

Consider the following function arguments (they are already extracted of the function): Monkey,"Blue Monkey", "Red, blue and \"Green'", 'Red, blue and "Green\'' Is there a way to extract arguments to get the following array ouput using regexp and stripping white spaces: [Monkey, "Blue Monkey", "Red, blue and \"Green'", 'Red, blue an...

php regex : How do I match this registry pathing?

I'm not very good with regex and i've been kinda scratching my head on this one. I got the following php code using preg_match which is supposed to match all characters in the registry pathing except for the record number... which in this case is "record??]": <?php $reg_section = "[HKEY_LOCAL_MACHIN\SOFTWARE\INTERSTAR TECHNOLOGIES\XM...

Perl regex: How to grab the part that is the same

Hi! I'm creating a ladder system for some games and I've encountered a problem regarding the clan base system. You see, every player who joins are parsed and put into a players table. Like this: chelsea | gordon chelsea | jim chelsea | brad OR... CLANTAG|> jenna CLANTAG|> jackson CLANTAG|> irene So, what I want: I wanna grab the C...

Regular expression that matches between quotes, containing escaped quotes

This was originally a question I wanted to ask, but while researching the details for the question I found the solution and thought it may be of interest to others. In Apache, the full request is in double quotes and any quotes inside are always escaped with a backslash: 1.2.3.4 - - [15/Apr/2005:20:35:37 +0200] "GET /\" foo=bat\" HTTP/...

How can I remove all tokens with non-word characters in Perl?

I am trying to come up with a regex for removing all words that contain non-word characters. So if it contains a colon, comma, number, bracket etc then remove it from the line, not just the character but the word. I have this so far. $wordline = s/\s.*\W.*?\s//g; Does not have to be perfect so removing strings with dash and apostroph...

PHP validate youtube script

I have a site that allows users to copy and paste the embeded video script that youtube provides and upload it to a database. I want to be able to check that this script is valid youtube script and not just random text that someone typed in. I believe this can be done with preg match. Any ideas? ...

Removing redundant line breaks with regular expressions

Hello, I'm developing a single serving site in PHP that simply displays messages that are posted by visitors (ideally surrounding the topic of the website). Anyone can post up to three messages an hour. Since the website will only be one page, I'd like to control the vertical length of each message. However, I do want to at least par...