tags:

views:

173

answers:

4

I am wondering if problem down here can be solved with one regular expression or I should make standard loop and evaluate line by line,

when I run included code I get ['Ethernet0/22', 'Ethernet0/24'], only result should be ['Ethernet0/23', 'Ethernet0/25'].

any advice on this?

 import re

 txt='''#
 interface Ethernet0/22
  stp disable
  broadcast-suppression 5
  mac-address max-mac-count 1
  port access vlan 452
 #
 interface Ethernet0/23
  stp disable
  description BTO
  broadcast-suppression 5
  port access vlan 2421
 #
 interface Ethernet0/24
  stp disable
  description Avaya G700
  broadcast-suppression 5
  port access vlan 452
 #
 interface Ethernet0/25
  stp disable
  description BTO
  broadcast-suppression 5
  port access vlan 2421
 #
 '''

 re1 = '''^interface (.*?$).*?BTO.*?^#$'''

 rg = re.compile(re1,re.IGNORECASE|re.DOTALL|re.MULTILINE)
 m = rg.findall(txt)
 if m:
  print m
+3  A: 

Your problem is that the regex is continuing to find the BTO in the next group. As a quick workaround, you could just prohibit the "#" character in the interface id (assuming this isn't valid within records, and only seperates them).

re1 = '''^interface ([^#]*?$)[^#]*?BTO.*?^#$'''
Brian
A: 

Rather than trying to make a pattern between the ^ and $ anchors, and relying on the # you could use the newlines break down the 'sublines' inside the single block match

e.g. identify the clauses in terms of a sequence of literal not-newlines leading up to a newline.

something like

 re1 = '''\ninterface ([^\n]+?)\n[^\n]+?\n[^\n]+BTO\n'''

will produce the result you are after, from the source text provided.

cms
+1  A: 

An example without regular expressions:

print [ stanza.split()[0]
        for stanza in txt.split("interface ")
        if stanza.lower().startswith( "ethernet" )
        and stanza.lower().find("bto") > -1 ]

Explanation:

I find compositions are best read "inside-out":

for stanza in txt.split("interface ")

Split the text on each occurrence of "interface " (including the following space). A resulting stanza will look like this:

Ethernet0/22
 stp disable
 broadcast-suppression 5
 mac-address max-mac-count 1
 port access vlan 452
#

Next, filter the stanzas:

if stanza.lower().startswith( "ethernet" ) and stanza.lower().find("bto") > -1

This should be self-explanatory.

stanza.split()[0]

Split the mathing stanzas on whitespace, and take the first element into the resulting list. This, in tandem with the filter startswith will prevent IndexErrors

exhuma
perhaps you meant 'not perfectly error proof' ?
cms
fixed. And it's OK nonetheless. I was overly pessimistic. Tested it now and it works just fine
exhuma
Added an explanation
exhuma
+2  A: 

Here is a little pyparsing parser for your file. Not only does this show a solution to your immediate problem, but the parser gives you a nice set of objects that you can use to easily access the data in each interface.

Here is the parser:

from pyparsing import *

# set up the parser
comment = "#" + Optional(restOfLine)
keyname = Word(alphas,alphanums+'-')
value = Combine(empty + SkipTo(LineEnd() | comment))
INTERFACE = Keyword("interface")
interfaceDef = Group(INTERFACE + value("name") + \
    Dict(OneOrMore(Group(~INTERFACE + keyname + value))))

# ignore comments (could be anywhere)
interfaceDef.ignore(comment)

# parse the source text
ifcdata = OneOrMore(interfaceDef).parseString(txt)

Now how to use it:

# use dump() to list all of the named fields created at parse time
for ifc in ifcdata:
    print ifc.dump()

# first the answer to the OP's question
print [ifc.name for ifc in ifcdata if ifc.description == "BTO"]

# how to access fields that are not legal Python identifiers
print [(ifc.name,ifc['broadcast-suppression']) for ifc in ifcdata 
    if 'broadcast-suppression' in ifc]

# using names to index into a mapping with string interpolation
print ', '.join(["(%(name)s, '%(port)s')" % ifc for ifc in ifcdata ])

Prints out:

['interface', 'Ethernet0/22', ['stp', 'disable'], ['broadcast-suppression', '5'], ['mac-address', 'max-mac-count 1'], ['port', 'access vlan 452']]
- broadcast-suppression: 5
- mac-address: max-mac-count 1
- name: Ethernet0/22
- port: access vlan 452
- stp: disable
['interface', 'Ethernet0/23', ['stp', 'disable'], ['description', 'BTO'], ['broadcast-suppression', '5'], ['port', 'access vlan 2421']]
- broadcast-suppression: 5
- description: BTO
- name: Ethernet0/23
- port: access vlan 2421
- stp: disable
['interface', 'Ethernet0/24', ['stp', 'disable'], ['description', 'Avaya G700'], ['broadcast-suppression', '5'], ['port', 'access vlan 452']]
- broadcast-suppression: 5
- description: Avaya G700
- name: Ethernet0/24
- port: access vlan 452
- stp: disable
['interface', 'Ethernet0/25', ['stp', 'disable'], ['description', 'BTO'], ['broadcast-suppression', '5'], ['port', 'access vlan 2421']]
- broadcast-suppression: 5
- description: BTO
- name: Ethernet0/25
- port: access vlan 2421
- stp: disable
['Ethernet0/23', 'Ethernet0/25']
[('Ethernet0/22', '5'), ('Ethernet0/23', '5'), ('Ethernet0/24', '5'), ('Ethernet0/25', '5')]
(Ethernet0/22, 'access vlan 452'), (Ethernet0/23, 'access vlan 2421'), (Ethernet0/24, 'access vlan 452'), (Ethernet0/25, 'access vlan 2421')
Paul McGuire