ansaurus

Question

Extracting some data items in a string using regular expression

Answer 1

+6 A:

The simplest way would be <!\[.+?]!> - just don't care about what is matched between the two delimiters at all. Only make sure that it always matches the closing delimiter at the earliest possible opportunity - therefore the ? to make the quantifier lazy.

(Also, no need to escape the ])

About the specification that the sequence ]!> should be "disallowed" within the fruit name - well that's implicit since it is the closing delimiter.

Tim Pietzcker 2009-11-19 17:52:37

+1 Didn't know you could use `]` without escape :)

Andomar 2009-11-19 18:05:45

Answer 2

+1 A:

To match a fruit name, you could use:

<!\[(.*?)]!>

After the opening <![, this matches the least amount of text that's followed by ]!>. By using .*? instead of .*, the least possible amount of text is matched.

Here's a full regex to match each fruit with the following text:

<!\[(.*?)]!>(.*?)(?=(<!\[)|$)

This uses positive lookahead (?=xxx) to match the beginning of the next tag or end-of-string. Positive lookahead matches but does not consume, so the next fruit can be matched by another application of the same regex.

Andomar 2009-11-19 18:00:37

Your information is very useful.

bobo 2009-11-25 11:06:45

Answer 3

+1 A:

depending on what language you are using, you can use the string methods your language provide by doing simple splitting (and simple regex that is more understandable). Split your string using "!>" as separator. Go through each field, check for <!. If found, replace all characters from front till <!. This will give you all the fruits. I use gawk to demonstrate, but the algorithm can be implemented in your language

eg gawk

# set field separator as !>
awk -F'!>' '
{ 
  # for each field 
  for(i=1;i<=NF;i++){
    # check if there is <!
    if($i ~ /<!/){
        # if <! is found,  substitute from front till <!
        gsub(/.*<!/,"",$i)

    }
    # print result
    print $i
  }
}
' file

output

# ./run.sh
[Apple]
[Banana]
[Orange]
[Pear]
[Pineapple]

No complicated regex needed.

ghostdog74 2009-11-21 10:31:14

I just want to know what regex syntax should be used. I've never heard of Gawk before. Thanks a lot for introducing me to this text-manipulating language.

bobo 2009-11-25 11:05:33

several people have posted regex solutions so i guess you can look at them.

ghostdog74 2009-11-25 11:43:26

if you want to learn about gawk , go to http://www.gnu.org/software/gawk/manual/

ghostdog74 2009-11-25 11:44:06

ansaurus

tags:

views:

answers:

Extracting some data items in a string using regular expression

related questions