tags:

views:

123

answers:

3

In my Groovy program, I have a list of lines and want to select a contiguous block of lines from the list. The first line of the desired block contains a particular string, and the last line (which I can include or not - it doesn't matter) contains a (different) marker string.

Working code is below but surely there is a "groovier" way to do this in a line or two - can anyone suggest how?

(Actually my list of lines is from an HTTP GET, but I just define a URL object and do url.openStream().readLines() to get the lines.)

lines = ["line1", "line2", "line3", "line4", "line5", "line6"]
println extract_block("2", "5", lines)

def extract_block(start, end, lines) {
    def block = null
    for (line in lines) {
        if (null == block && line.contains(start)) { block = [] }
        if (null != block && line.contains(end)) { break }
        if (null != block) { block << line }
    }
    return block
}

This prints

["line2", "line3", "line4"]

which includes the first line (containing "2") and skips the final line (containing "5").

A: 

Regular expressions are designed for this kind of problem. Here's an example that shows it works both where the file contents

(NB: the String.find method is in groovy 1.6.1 and above, prior to that you'd need to modify the syntax slightly)

def BEGIN_MARKER = "2"
def END_MARKER = "5"
def BEGIN_LINE = /.*$BEGIN_MARKER.*\n/
def MIDDLE_LINES = /(.*\n)*?/
def LOOKAHEAD_FOR_END_LINE_OR_EOF = /(?=.*$END_MARKER.*|\Z)/

def FIND_LINES = "(?m)" +
                 BEGIN_LINE +
                 MIDDLE_LINES +
                 LOOKAHEAD_FOR_END_LINE_OR_EOF    

def fileContents = """
line1 ipsum
line2 lorem
line3 ipsum
line4 lorem
line5 ipsum
line6 lorem
"""

// prints:
// line2 lorem
// line3 ipsum
// line4 lorem
println fileContents.find(FIND_LINES)

def noEndMarkerFileContents = """
line1 ipsum
line2 lorem
line3 ipsum
line4 lorem
line6 lorem
"""

// prints:
// line2 lorem
// line3 ipsum
// line4 lorem
// line6 lorem        
println noEndMarkerFileContents.find(FIND_LINES)
Ted Naleid
I was trying to stay away from regexes since they are very hard to read (for me anyway). Some comments would make sense of the regexes at the cost of lots more lines - but I'd rather use something really short and digestible.
Douglas Squirrel
A: 

Alternatively, if you don't want to use regular expressions, you can make it a little more groovy using findAll:

lines = ["line1", "line2", "line3", "line4", "line5", "line6"]

def extract_block(start, end, lines) {
    def inBlock = false
    return lines.findAll { line ->        
        if (line.contains(start)) inBlock = true
        if (line.contains(end)) inBlock = false            
        return inBlock
    }
}

assert ["line2", "line3", "line4"] == extract_block("2", "5", lines)
Ted Naleid
Thanks Ted. I made this a little shorter below but it's more or less what I was after. If anyone can make it still more concise that would be welcome.
Douglas Squirrel
A: 

A shorter but somewhat less expressive version of Ted's "groovier" answer:

def extract_block(start, end, lines) {
    def inBlock = false
    return lines.findAll { inBlock = (it.contains(start) || inBlock) && !it.contains(end) } 
}
Douglas Squirrel