ansaurus

Question

Answer 1

+3 A:

Your problem is that the regex is continuing to find the BTO in the next group. As a quick workaround, you could just prohibit the "#" character in the interface id (assuming this isn't valid within records, and only seperates them).

re1 = '''^interface ([^#]*?$)[^#]*?BTO.*?^#$'''

Brian 2009-09-18 09:41:11

Answer 2

A:

Rather than trying to make a pattern between the ^ and $ anchors, and relying on the # you could use the newlines break down the 'sublines' inside the single block match

e.g. identify the clauses in terms of a sequence of literal not-newlines leading up to a newline.

something like

 re1 = '''\ninterface ([^\n]+?)\n[^\n]+?\n[^\n]+BTO\n'''

will produce the result you are after, from the source text provided.

cms 2009-09-18 09:42:33

Answer 3

+1 A:

An example without regular expressions:

print [ stanza.split()[0]
        for stanza in txt.split("interface ")
        if stanza.lower().startswith( "ethernet" )
        and stanza.lower().find("bto") > -1 ]

Explanation:

I find compositions are best read "inside-out":

for stanza in txt.split("interface ")

Split the text on each occurrence of "interface " (including the following space). A resulting stanza will look like this:

Ethernet0/22
 stp disable
 broadcast-suppression 5
 mac-address max-mac-count 1
 port access vlan 452
#

Next, filter the stanzas:

if stanza.lower().startswith( "ethernet" ) and stanza.lower().find("bto") > -1

This should be self-explanatory.

stanza.split()[0]

Split the mathing stanzas on whitespace, and take the first element into the resulting list. This, in tandem with the filter startswith will prevent IndexErrors

exhuma 2009-09-18 09:49:05

perhaps you meant 'not perfectly error proof' ?

cms 2009-09-18 09:51:43

fixed. And it's OK nonetheless. I was overly pessimistic. Tested it now and it works just fine

exhuma 2009-09-18 09:54:59

Added an explanation

exhuma 2009-09-18 10:03:40

Answer 4

+2 A:

Here is a little pyparsing parser for your file. Not only does this show a solution to your immediate problem, but the parser gives you a nice set of objects that you can use to easily access the data in each interface.

Here is the parser:

from pyparsing import *

# set up the parser
comment = "#" + Optional(restOfLine)
keyname = Word(alphas,alphanums+'-')
value = Combine(empty + SkipTo(LineEnd() | comment))
INTERFACE = Keyword("interface")
interfaceDef = Group(INTERFACE + value("name") + \
    Dict(OneOrMore(Group(~INTERFACE + keyname + value))))

# ignore comments (could be anywhere)
interfaceDef.ignore(comment)

# parse the source text
ifcdata = OneOrMore(interfaceDef).parseString(txt)

Now how to use it:

# use dump() to list all of the named fields created at parse time
for ifc in ifcdata:
    print ifc.dump()

# first the answer to the OP's question
print [ifc.name for ifc in ifcdata if ifc.description == "BTO"]

# how to access fields that are not legal Python identifiers
print [(ifc.name,ifc['broadcast-suppression']) for ifc in ifcdata 
    if 'broadcast-suppression' in ifc]

# using names to index into a mapping with string interpolation
print ', '.join(["(%(name)s, '%(port)s')" % ifc for ifc in ifcdata ])

Prints out:

['interface', 'Ethernet0/22', ['stp', 'disable'], ['broadcast-suppression', '5'], ['mac-address', 'max-mac-count 1'], ['port', 'access vlan 452']]
- broadcast-suppression: 5
- mac-address: max-mac-count 1
- name: Ethernet0/22
- port: access vlan 452
- stp: disable
['interface', 'Ethernet0/23', ['stp', 'disable'], ['description', 'BTO'], ['broadcast-suppression', '5'], ['port', 'access vlan 2421']]
- broadcast-suppression: 5
- description: BTO
- name: Ethernet0/23
- port: access vlan 2421
- stp: disable
['interface', 'Ethernet0/24', ['stp', 'disable'], ['description', 'Avaya G700'], ['broadcast-suppression', '5'], ['port', 'access vlan 452']]
- broadcast-suppression: 5
- description: Avaya G700
- name: Ethernet0/24
- port: access vlan 452
- stp: disable
['interface', 'Ethernet0/25', ['stp', 'disable'], ['description', 'BTO'], ['broadcast-suppression', '5'], ['port', 'access vlan 2421']]
- broadcast-suppression: 5
- description: BTO
- name: Ethernet0/25
- port: access vlan 2421
- stp: disable
['Ethernet0/23', 'Ethernet0/25']
[('Ethernet0/22', '5'), ('Ethernet0/23', '5'), ('Ethernet0/24', '5'), ('Ethernet0/25', '5')]
(Ethernet0/22, 'access vlan 452'), (Ethernet0/23, 'access vlan 2421'), (Ethernet0/24, 'access vlan 452'), (Ethernet0/25, 'access vlan 2421')

Paul McGuire 2009-09-18 11:58:40

ansaurus

tags:

views:

answers:

help with python regular expression

An example without regular expressions:

Explanation:

related questions