regex

Remove unwanted line feeds from an HTML file.

I have a lot of HTML files which have unwanted line-feeds. These break things like inline javascript and formatting within the pages. I want to come up with a way to strip out all line feeds from the pages that do not appear directly after an html tag e.g </div>. Does anyone know of a regex and/or program that may be able to acheive this...

What is the proper way of inserting a pipe into a Java Pattern expression?

What is the proper way of inserting a pipe into a Java Pattern expression? I actually want to use a pipe as a delimiter and not the or operator. I.E: "hello|world".split("|"); --> {"hello", "world"} ...

Regular Expression for carriage return occuring at begining or end file.

I am looking for a way to remove 'stray' carriage returns occurring at the beginning or end of a file. ie: \r\n <-- remove this guy some stuff to say \r\n some more stuff to say \r\n \r\n <-- remove this guy How would you match \r\n followed by 'nothing' or preceded by 'nothing'? ...

preg_match_all question (how to limit scope without a seperate preg_match call)

Hello, I have some data similar to this: aaa1|aaa2|ZZZ|aaa3|aaa4|aaa5|ZZZ|aaa6|aaa7 I want to match all "aaa[0-9]" BETWEEN "ZZZ" (not the ones outside). So I have some PHP code: $string = "aaa1aaa2zzzzaaa3aaa4aaa5zzzzaaa6aaa7"; preg_match_all("/zzzz.*(aaa[0-9]).*zzzz/", $string, $matches, PREG_SET_ORDER); print_r($m...

PHP: replace invalid characters in utf-8 string in

How replace (use regex in PHP5) invalid characters in utf-8 string on white space characters? ...

Is it possible to simplify this regular expression any further?

I'm working on some homework for my compiler class and I have the following problem: Write a regular expression for all strings of a's and b's that contain an odd number of a's or an odd number of b's (or both). After a lot of whiteboard work I came up with the following solution: (aa|bb)* (ab|ba|a|b) ((aa|bb)* (ab|ba) (aa|bb)* (ab|ba...

mb_ereg_* in PHP6

So ereg won't be present in PHP6. And I don't really care, because I'm using PCRE functions. But for multibyte strings, I'm using mb_ereg_* functions. The question is: they'll be present in PHP6 in the mbstring extension, or I will have to switch to some kind of multibyte PCRE functions? ...

PHP regex extract/replace values from xml-like tags via named (sub)groups

Trying to create a simple text-translator in PHP. It shoult match something like: Bla bla {translator id="TEST" language="de"/} The language can be optional Blabla <translator id="TEST"/> Here is the code: $result = preg_replace_callback( '#{translator(\s+(?\'attribute\'\w+)="(?\'value\'\w+)")+/}#i', array($this, 'translateTe...

Javascript Regular Expression problem

Hi everyone, I am trying to incorporate a regular expression i have used in the past in a different manner into some validtation checking through javascript. The following is my script: var regOrderNo = new RegExp("\d{6}"); var order_no = $("input[name='txtordernumber']").val(); alert(regOrderNo.test(order_no)); Why woul...

What is the BNF for a regex (in order to write a full or partial parser)

I am interested in parsing regexes (not to be confused with using regexes for parsing). Is there a BNF for Java 1.6 regexes (or other languages?) [NOTE: There is a similar older question which did not lead to an answer for Java.] EDIT To explain why I need to do this. We are implementing a shallow parser for Natural language processing...

Another Javascript Regular Expression problem

Hello again, I have a similar issue as my recent post but with a zip code validator i am trying to convert over to a javascript validation process. my script looks like so: var regPostalCode = new RegExp("\\d{5}(-\d{4})?"); var postal_code = $("input[name='txtzipcode']").val(); if (regPostalCode.test(postal_code) == false)...

PHP simple regex

I want to validate a string only if it contains '0-9' chars with length between 7 and 9. What I have is [0-9]{7,9} but this matches a string of ten chars too, which I don't want. Thanks. ...

expected behavior of posix extended regex: (()|abc)xyz

On my OS X 10.5.8 machine, using the regcomp and regexec C functions to match the extended regex "(()|abc)xyz", I find a match for the string "abcxyz" but only from offset 3 to offset 6. My expectation was that the entire string would be matched and that I would see a submatch for the initial "abc" part of the string. When I try the sa...

php regular expression to capture number from table and load into variable

So here is the string that im scraping a page to read (using file get contents) <th>Kills (K)</th><td><strong>4,751</strong></td><td><strong>0</strong></td> How can i navigate to the above section of the page contents, and then isolate the 4,751 inside the above html and load it into $kills ? Difficulty: the number will change and ha...

How to find if a Java String contains X or Y and contains Z

I'm pretty sure regular expressions are the way to go, but my head hurts whenever I try to work out the specific regular expression. What regular expression do I need to find if a Java String (contains the text "ERROR" or the text "WARNING") AND (contains the text "parsing"), where all matches are case-insensitive? EDIT: I've presented...

Regular expression to validate numeric values

Regular expression to validate a text box where i can enter an integer / float value in asp.net ...

Anchor tags to plain text within content

I am trying to match <a> tags within my content and replace then with the link text followed by the url in square brackets for a print-version. The following example works if there is only the "href". If the <a> contains another attribute, it matches too much and doesn't return the desired result. How can I match the URL and the link ...

Django URL conf and view for retrieving multiple "tags" dynamically

In Django, I'm trying to write a URLconf and view that can take a theoretically unlimited number of "tags". The reason for this is to retrieve objects that have been tagged with different combinations of tags. For example, URLs like this are desireable: /topics/tag1/tag2/tag3 The above URL would retrieve "topics" that have been tagge...

What is a regular expression I can use in Vim to find CVS conflicts?

What is a regular expression I can use in Vim to find conflicts in CVS and possibly other version control systems? ...

Multiline Regular Expression search and replace!

I've hit a wall. Does anybody know a good text editor that has search and replace like Notepad++ but can also do multi-line regex search and replace? Basically, I am trying to find something that can match a regex like: search oldlog\(.*\n\s+([\r\n.]*)\);replace newlog\(\1\) Any ideas? ...