ansaurus

Question

How can I detect and split words like "Apple 1 & 2" into "Apple 1" and "Apple 2" from a list?

Answer 1

A:

The last part can be matched by (?:\d+, )*\d+ & \d+$. Though you may wanna replace the spaces with \s+. Once you have the matching string, splitting it by [,&\s]+ will give you each number.

Actually, if you use ^(\D+) ((?:\d+, )*\d+ & \d+)$, matching should return a list like ["the first part", "the numbers"]. So you get everything. Split the second string, and there ya go.

cHao 2010-05-25 16:14:15

Answer 2

A:

I'l write in Perl since you didn't specify which flavor of RegEx

It sounds like what you want may be (assuming no numbers in Foo Bar):

/(\D+)(\d+)(, \d+)*( & \d+)/;

Then $1 will be "Foo Bar" $2, $3 ... will be the individual #s, prepended by ", " or " & " so you will need to strip those from each #.

DVK 2010-05-25 16:17:13

Answer 3

+1 A:

Note: this answer is based on an older revision of the question

In Java, I think something like this is what you want:

    String[] tests = {
        "One Two 1 & 2",
        "Boeing 737 2, 4 & 6",
        "Lucky 7",
        "MI6 agent 007, 006",
        "2010-05 26, 27 & 28"
    };
    for (String test : tests) {
        String[] parts = test.split("(?=\\d+(, \\d+)*( & \\d+)?$)", 2);
        for (String number : parts[1].split("\\D+")) {
            System.out.println(parts[0] + number);
        }
    }

This prints: (as seen on ideone.com)

One Two 1
One Two 2
Boeing 737 2
Boeing 737 4
Boeing 737 6
Lucky 7
MI6 agent 007
MI6 agent 006
2010-05 26
2010-05 27
2010-05 28

Essentially we use lookahead to split where the special number sequence begins, limiting the split into 2 parts. The special number sequence is then split on any sequence of non-digits \D+.

The pattern for the special number sequence, as shown in the lookahead, is:

\d+(, \d+)*( & \d+)?$

API references

String[] split(String regex, int limit)
- The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

A single `replaceAll` solution

If, for whatever reason, you insist on doing this in one swooping replaceAll, you can write something like this:

String[] tests = {
    "One Two 1 & 2",
    "Boeing 737 2, 4 & 6",
    "Lucky 7",
    "MI6 agent 007, 006",
    "2010-05 26, 27 & 28",
};
String sequence = "\\d+(?:, \\d+)*(?: & \\d+)?$";
for (String test : tests) {         
    System.out.println(
        test.replaceAll(
            "^.*?(?=sequence)|(?<=(?=(.*?)(?=sequence))^.*)(\\d+)(\\D+)?"
                .replace("sequence", sequence),
            "$1$2$3"
        )
    );
}

The output (as seen on on ideone.com):

One Two 1 & One Two 2
Boeing 737 2, Boeing 737 4 & Boeing 737 6
Lucky 7
MI6 agent 007, MI6 agent 006
2010-05 26, 2010-05 27 & 2010-05 28

This uses triple-nested assertions, including the infinite-length lookbehind feabug in Java. I wouldn't recommend using it, but there it is.

polygenelubricants 2010-05-25 16:54:22

Answer 4

A:

Look at the design of Parse::Range on CPAN:

http://cpansearch.perl.org/src/PERLER/Parse-Range-0.96/lib/Parse/Range.pm

You may need to tweak the logic a little bit to support the ampersands.

David M 2010-05-25 17:35:16

ansaurus

tags:

views:

answers:

How can I detect and split words like "Apple 1 & 2" into "Apple 1" and "Apple 2" from a list?

API references

See also

A single `replaceAll` solution

related questions

ansaurus

tags:

views:

answers:

How can I detect and split words like "Apple 1 & 2" into "Apple 1" and "Apple 2" from a list?

API references

See also

A single replaceAll solution

related questions

A single `replaceAll` solution