ansaurus

Question

Strip words beginning with a specific letter from a sentence using regex

Answer 1

A:

/\ba[a-z]*\b/i

will match any word starting with 'a'.

The \b indicates a word boundary - we want to only match starting from the beginning of a word, after all.

Then there's the character we want our word to start with.

Then we have as many as possible letter characters, followed by another word boundary.

Anon. 2010-02-03 00:30:27

Answer 2

A:

To match all words starting with t, use:

\bt\w+

That will match test but not footest; \b means "word boundary".

Rubens Farias 2010-02-03 00:31:11

This depends on whether you class something like "t_001" as a 'word' or not.

Anon. 2010-02-03 00:32:22

yeah, I was writing same to you =) OP must be more precise

Rubens Farias 2010-02-03 00:33:35

Answer 3

A:

You can use \b. It matches word boundaries--the invisible spot just before and after a word. (You can't see them, but oh they're there!) Here's the regex:

/\b(a\w*)\b/

The \w matches a word character, like letters and digits and stuff like that.

You can see me testing it here: http://rubular.com/regexes/13347

yjerem 2010-02-03 00:34:12

interesting site...

Rubens Farias 2010-02-03 00:35:17

Answer 4

+1 A:

Similar to Anon.'s answer:

/\b(a\w*)/g

and then see all the results with (usually) $n, where n is the n-th hit. Many libraries will return /g results as arrays on the $n-th set of parenthesis, so in this case $1 would return an array of all the matching words. You'll want to double-check with whatever library you're using to figure out how it returns matches like this, there's a lot of variation on global search returns, sadly.

As to the \w vs [a-zA-Z], you can sometimes get faster execution by using the built-in definitions of things like that, as it can easily have an optimized path for the preset character classes.

The /g at the end makes it a "global" search, so it'll find more than one. It's still restricted by line in some languages / libraries, though, so if you wish to check an entire file you'll sometimes need /gm, to make it multi-line

If you want to remove results, like your title (but not question) suggests, try:

    /\ba\w*//g

which does a search-and-replace in most languages (/<search>/<replacement>/). Sometimes you need a "s" at the front. Depends on the language / library. In Ruby's case, use:

string.gsub(/(\b)a\w*(\b)/, "\\1\\2")

to retain the non-word characters, and optionally put any replacement text between \1 and \2. gsub for global, sub for the first result.

Groxx 2010-02-03 00:37:25

Answer 5

+1 A:

Scan may be a good tool for this:

#!/usr/bin/ruby1.8

s = "I think Paris in the spring is a beautiful place"
p s.scan(/\b[it][[:alpha:]]*/i)
# => ["I", "think", "in", "the", "is"]

\b means 'word boundary."
[:alpha:] means upper or lowercase alpha (a-z).

Wayne Conrad 2010-02-03 00:48:37

I think I'd prefer `s.scan(/\b[it]\w*\b/)`, but that's a minor difference. However, shouldn't the output array be `["I", "think", "in", "the", "is"]`?

kejadlen 2010-02-03 01:02:12

@kejadlen, Thanks. I had changed the code but then forgot to past the new output into it. I think it may be better your way, too.

Wayne Conrad 2010-02-03 01:20:46

Answer 6

A:

Personally i think that regex is overkill for this application, simply running a select is more than capable of solving this particular problem.

"this is a test".split(' ').select{ |word| word[0,1] == 't' } 

result => ["this", "test"]

or if you are determined to use regex then go with grep

"this is a test".split(' ').grep(/^t/)

result => ["this", "test"]

Hope this helps.

roja 2010-02-03 17:00:43

ansaurus

tags:

views:

answers:

Strip words beginning with a specific letter from a sentence using regex

related questions