tags:

views:

241

answers:

11

I want to use a regular expression to find strings which are exactly 10 characters long and begin with "7". Please could someone tell me how?

+10  A: 

This ought to work:

^7.{9}$

The 7 matches the character '7'. The . matches any character. The {9} tells it to match the previous symbol 9 times.

Edit: The ^ matches the beginning of the string and the $ matches the end of the string.

recursive
that will match whitespace as well, which I don't think OP wanted to do (though, I understand he didn't specify).
inkedmn
I would think that should be pretty clear by the use of the word "string". I don't tend to think of "7123 45678" as a string starting with 7 and with 10 characters - it's more like two strings to me.
paxdiablo
string != word. Since when is space not a legal citizen in strings? The way I see it the answer is perfectly valid. The question could have been more specific - that I agree!
Miky Dinescu
@Pax: Most people have a different understanding of strings than you.
recursive
This regex will match strings of 10 or more characters and will also match string that do not start with 7
Peter van der Heijden
I'm not sure why this is at +8. It is wrong in more ways than one.
Tomalak
7[^\s]{0} should address whitespace
Michael Stum
@recursive, I read the question as being able to pick out 10-character words where there may be more than one per line. Your interpretation seems less likely to me since it requires lines be ten characters each to match. Your RE, by the way, will match "7999999999999999999999999" as well, unless you specify anchors of some point (either line start/end or word boundaries). Of course, I've been wrong before so that's not out of the bounds of possibility :-) Have requested clarification from @test.
paxdiablo
I think it's totally unfair to criticize this answer, **given the amount of information provided by the OP**. Doesn't match exactly X characters? Well, it depends, on Java's regex engine it does. Matches whitespace? Sure, but as far as I know, whitespace is perfectly legal in a string.
JG
The word "exactly" was added after I gave my answer. I've updated my answer to answer the new question, but it's not clear that's what was wanted by the original poster.
recursive
@Tomalak: It became wrong after you changed the question.
recursive
@JG: he said he wanted to match strings of 10 chars, not *substrings* of 10 chars. @Michael Stum: `\S` is made for that purpose, but that regex is like.. wtf. 7 followed by 0 non-whitespace chars. I don't see how that fits the bill.
nickf
@nickf, I think @Michael's RE was a fat-finger problem - it should have been 9.
paxdiablo
+4  A: 

This pattern should work: 7.{9}

The '7' will obviously match the digit '7'. The '.' will match any character, and the '{9}' will match the preceding character or group (in this case, '.') 9 times.

It's helpful to spend some time learning about Regular Expressions -- you might take a look at this article or this "cheat sheet" to help you get a feel for how they work. A tool like RegexBuddy or Regular Expression Builder (my personal favorite -- simple and powerful) can also be a huge help.

Donut
I think you meant "(in this case, '.')" - you have a comma.
paxdiablo
Yup, good catch.
Donut
A: 

7.{9} Yeah, probably not what you're looking for, but with what information you've provided, it's exactly what you asked for.

Brian
+2  A: 

"beginning with 7" - do you mean beginning with the number 7?

Anyway, to match a string that's exactly 10 characters long:

/[a-z0-9]{10}/i

That will match any string made up of letters and numbers. It's easy enough to add additional characters to match, but since you didn't specify...

Good luck

inkedmn
+5  A: 

What are all these answers with "7.{9}"? They will match things like "7abc efghi" which is clearly two strings (in my opinion, based on the tone of the question). They'll also match "7999999999999999999999999999" (which is clearly wrong based on the question) unless you make it clear you're using an RE function with implicit start/end line boundaries.

What you probably need is a whitelist of possibile characters amking up a string, something like:

7[A-Za-z0-9]{9}

with (potentially) word boundaries on either side, or (for older RE engines without that feature), all of these ones:

[^A-Za-z0-9]7[A-Za-z0-9]{9}[^A-Za-z0-9]
^7[A-Za-z0-9]{9}[^A-Za-z0-9]
[^A-Za-z0-9]7[A-Za-z0-9]{9}$
^7[A-Za-z0-9]{9}$

If there are more characters that you would consider part of a "string", simply add them to the "[A-Za-z0-9]" sections.

paxdiablo
most complete answer, well argumented +
DrFalk3n
This doesn't work for unicode characters. But it's all good, since the OP will probably not return to clarify anything.
JG
"abc def" is clearly not two strings.
recursive
well then it should at least be `^7.{9}$` - i think using the standard word boundary sequence `\b` would be useful and clearer than the big backwards-compatible one there.
nickf
@nickf, there's a whole world out there where \b, \s and even \d mean nothing in REs. And, @recursive, "abc def" is one string only because it's got quotes around it which is not the case I was putting forward: the quotes were just to call out the text. Delimiting strings with spaces is as valid as delimiting then with \n, otherwise (with reductio ad absurdum) your string should be the entire file, not one string per line. There's not much point arguing - we just have different ideas as to what the question meant. Hopefully @test will clear it up. Otherwise there'll be multiple good answers.
paxdiablo
A: 

this will match any number that starts with 7 that is no longer than 10 digits.

\b7\d{0,9}\b
neoneye
+3  A: 

Something like this maybe:

(^|\n)(7\S{9})\s.*

Where (^|\n) is the start of the string (either bof, or new line), (7\S{9}) is the string we want, \S is any non-whitespace character, \s is any whitespace, and .* means anything else we don't want.

wergeld
The trailing .* is unnecessary
William Pursell
True, but for completeness I left it in. Habit I suppose.
wergeld
That will find strings that are not necessarily exact 10 characters long.
Kirill V. Lyadvinsky
How so, Kirill?
wergeld
+2  A: 

Like what Pax said, if you want to match a string which doesn't match any whitespace, that is:

// match this:
715kdgbp94

// not this
7 539 136a

then you could use this regex:

7\S{9}

It's also worth pointing out that this isn't a great use of regexes, and that some fairly standard and much more understandable methods would probably exist in whatever language you are using. For example, in PHP:

if (strlen($myString) == 10 && $myString[0] == "7")) {

or Javascript:

if (myString.length == 10 && substr(myString, 0, 1) == "7") {
nickf
+3  A: 

To find strings (not words) that are exactly 10 characters long:

^7\S{9}$
^ - begin of the string
7 - string must start with 7
\S{9} - 9 characters (use .{9} if you wish to allow any symbol in string including whitespaces)
$ - end of the string

If it is a homework I'd recommend to use Regex Builder to test your knowledge of regular expressions.

Kirill V. Lyadvinsky
+3  A: 

As far as I'm concerned embedded white space is not an issue, use anchoring to make sure the string is not too long

^7.{9}$
Peter van der Heijden
A: 
\b7\w{9}\b

As in "continuous series of word characters that starts with '7' and stands on its own".

(?<=^7|\s)7\w{9}(?=$|\s)

As in "continuous series of word characters that that starts with '7' and is enclosed in white space".

The latter implies that look-behind and look-ahead are supported, obviously.

Tomalak