word-boundary

Regex: How to match the first word after an expression

For example, in this text: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eu tellus vel nunc pretium lacinia. Proin sed lorem. Cras sed ipsum. Nunc a libero quis risus sollicitudin imperdiet. I want to match the word after 'ipsum'. ...

Finding words strictly starting with $, Regex C#

I need to find all matches of word which strictly begins with "$" and contains only digits. So I wrote [$]\d+ which gave me 4 matches for $10 $10 $20a a$20 so I thought of using word boundaries using \b: [$]\d+\b But it again matched a$20 for me. I tried \b[$]\d+\b but I failed. I'm looking for saying, ACCEPT ONLY IF THE ...

AS3 RegExp to match words with boundry type characters in them

I'm wanting to match a list of words which is easy enough when those words are truly words. For example /\b (pop|push) \b/gsx when ran against the string pop gave the door a push but it popped back will match the words pop and push but not popped. I need similar functionality for words that contain characters that would normally q...

What is a word boundary in regexes?

I am using Java regexes in Java 1.6 (inter alia to parse numeric output) and cannot find a precise definition of \b ("word boundary"). I had assumed that "-12" would be an "integer word" (matched by \b\-?\d+\b) but it appears that this does not work. I'd be grateful to know of ways of matching space-separated numbers. Example: Pat...

Regex word-break with unicode diacritics

I am working on an application that searches text using regular expressions based on input from a user. One option the user has is to include a "Match 0 or more characters" wildcard using the asterisk. I need this to only match between word boundaries. My first attempt was to convert all asterisks to (?:(?=\B).)*, which works fine for mo...

mysql: instr specify word boundaries

i want to check if a string contains a field value as a substring or not. select * from mytable where instr("mystring", column_name); but this does not search on word boundaries. select * from mytable where instr("mystring", concat('[[:<:]]',column_name,'[[:>:]]'); does not work either. how to correct this? ...

How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

I am attempting to scan a string of words and look for the presence of a particular word(case insensitive) in an XSLT 2.0 stylesheet using REGEX. I have a list of words that I wish to iterate over and determine whether or not they exist within a given string. I want to match on a word anywhere within the given text, but I do not want t...

php regex word boundary matching in utf-8

Hi, I have the following php code in a utf-8 php file: var_dump(setlocale(LC_CTYPE, 'de_DE.utf8', 'German_Germany.utf-8', 'de_DE', 'german')); var_dump(mb_internal_encoding()); var_dump(mb_internal_encoding('utf-8')); var_dump(mb_internal_encoding()); var_dump(mb_regex_encoding()); var_dump(mb_regex_encoding('utf-8')); var_dump(mb_regex...

Using \b in C# regular expressions doesn't work?

I am wondering why the following regex does not match. string query = "\"1 2\" 3"; string pattern = string.Format(@"\b{0}\b", Regex.Escape("\"1 2\"")); string repl = Regex.Replace(query, pattern, "", RegexOptions.CultureInvariant); Note that if I remove the word boundary characters (\b) from pattern, it matches fine. Is there somethi...

utf-8 word boundary regex in javascript

In JavaScript: "ab abc cab ab ab".replace(/\bab\b/g, "AB"); correctly gives me: "AB abc cab AB AB" When I use utf-8 characters though: "αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB"); the word boundary operator doesn't seem to work: "αβ αβγ γαβ αβ αβ" Is there a solution to this? ...

Javascript RegExp and boundaries

Hi Guys, A colleague asked me about a Regular expression problem, and I can't seem to find and answer for him. We're using boundaries to highlight certain lengths of text in a text editor, but here's some sample code that shows the problem: <script type="text/javascript"> var str = "Alpha , Beta, Gamma Delta Epsilon, AAlphaa, Beta Alp...

Should I be able to quote a leading or trailing dollar sign ($) inside a word boundary in Java Regular Expression?

I'm having trouble getting regular expressions with leading / trailing $'s to match in Java (1.6.20). From this code: System.out.println( "$40".matches("\\b\\Q$40\\E\\b") ); System.out.println( "$40".matches(".*\\Q$40\\E.*") ); System.out.println( "$40".matches("\\Q$40\\E") ); System.out.println( " ------ " ); System.out.println( "40$"...

Perl regex replacing at word boundary. Detecting "/" as a word boundary

Hi everyone, I am running into a strange regex issue.... I have a document where I am doing a replace... as an example I want to replace "DEXX" with "DEXX/AREX" and then with the next substitution replace... "AREX" with "AREX/CUBE" DEXX and AREX are stored in a hash like so.... "DEXX" => "AREX", "AREX" => "CUBE" The regex I have is thi...

word boundary on non latin characters in php

This example works fine: echo preg_replace("/\bI\b/u", 'we', "I can"); // we can This one were russian letters are used does not work even though I use "u" modifier: echo preg_replace("/\bЯ\b/u", 'мы', 'Я могу'); // still "Я могу" So the question is what should I do to fix this? Thanks. ...

Word boundary detection from text

Hi, I am having this problem with word boundary identification. I removed all the markup of the wikipedia document, now I want to get a list of entities.(meaningful terms). I am planning to take bi-grams, tri-grams of the document and check if it exists in dictionary(wordnet). Is there a better way to achieve this. Below is the sample ...

How can I make a regular expression which takes accented characters into account?

I have a JavaScript regular expression which basically finds two-letter words. The problem seems to be that it interprets accented characters as word boundaries. Indeed, it seems that A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), ...

PostgreSQL Regex Word Boundaries?

Does PostgreSQL support \b? I'm trying \bAB\b but it doesn't match anything, whereas (\W|^)AB(\W|$) does. These 2 expressions are essentially the same, aren't they? ...

Find "word" index in paragraph with jQuery (Javascript)

I have a string that represents a paragraph of text. var paragraph = "It is important that the word cold is not partially selected where we search for the word old"; I want to be able to search this paragraph for the index of a "word" and have it do an exact match on a "word". For example, when searching for "old". I should only get...

C# word boundary regex instead of .Contains() needed

I have a list: var myList = new List<string> { "red", "blue", "green" }; I have a string: var myString = "Alfred has a red and blue tie"; I am trying to get a count of matches of words in myList within myString. Currently, I am using .Contains(), which gets me a count of 3 because it is picking up the "red" in "Alfred". I need to...

word boundary regex problem (overlap)

Given the following code: var myList = new List<string> { "red shirt", "blue", "green", "red" }; Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b"); MatchCollection m = r.Matches("Alfred has a red shirt and blue tie"); I want the result of m to include "red shirt", "blue", "red" since all those are in the string...