tags:

views:

250

answers:

5

I have a .csv file and I'm only interested in rows with comma delimeted integers:

23,2,4,56,78,9,4,6

The number of comma delimited values in a row should be more than 5 (or whatever).

I'm doing this in perl.

+11  A: 
/^(\d+,){4,}\d+$/

Match a set of digits, followed by a comma. The digit-comma pair is treated as a group, which itself has to be repeated at least four times. Then you match the final number in the sequence which doesn't need to be followed by a comma.

If you don't need to capture the digits, use non-capturing groups instead (will be marginally faster):

/^(?:\d+,){4,}\d+$/
Welbog
The best way...
Zanoni
A: 

For whatever:

/^([0-9]\+,?)\+$/

Edited to correct the error pointed out in the comment.

Zsolt Botykai
Will match ,,,, and will not match 10,10
GalacticCowboy
+1 you're right, I was too lazy. Edited.
Zsolt Botykai
Won't match, the backslash is escaping the + quantifier rendering it literal instead of a quantifier so it needs to match one digit followed by a plus sign and an optional comma followed by another plus sign. Should be: /^([0-9]+,?)+$/
And besdies it will match strings less than the desired quanity of "sets" of 5 so even when fixed as described will still not work as required.
+1  A: 
/\d{1,3}(,\d{3}){0,4}/

This will only match properly formatted comma delimited numbers (100,000,000 for instance). It is still a terrible idea to have comma delimited numbers in a comma separated file, but I digress. That regex is the least likely to have problems in the context.

Ben Hughes
This will not match the sample given in the OP, since subsequent groups have to be exactly 3 digits.
GalacticCowboy
@GalacticCowboy: Somehow that seems to be what the OP wanted, I guess. Pity he didn't tell us that in his question.
Welbog
Yep... Oh well.
GalacticCowboy
A: 
/\d+(?:,\d+)*/

or including negative numbers

/-?\d+(?:,-?\d+)*/
Hynek -Pichi- Vychodil
A: 

You may want to consider using [0-9] instead of \d, since \d can match things that Unicode considers numbers but aren't the standard Arabic numerals.

Chris Simmons