tags:

views:

35

answers:

2

Hello,

I have data coming in to a python server via a socket. Within this data is the string '<port>80</port>' or which ever port is being used.

I wish to extract the port number into a variable. The data coming in is not XML, I just used the tag approach to identifying data for future XML use if needed. I do not wish to use an XML python library, but simply use something like regexp and strings.

What would you recommend is the best way to match and strip this data?

I am currently using this code with no luck:

p = re.compile('<port>\w</port>')
m = p.search(data)
print m

Thank you :)

+1  A: 

Regex can't parse XML and shouldn't be used to parse fake XML. You should do one of

  • Use a serialization method that is nicer to work with to start with, such as JSON or an ini file with the ConfigParser module.
  • Really use XML and not something that just sort of looks like XML and really parse it with something like lxml.etree.
  • Just store the number in a file if this is the entirety of your configuration. This solution isn't really easier than just using JSON or something, but it's better than the current one.

Implementing a bad solution now for future needs that you have no way of defining or accurately predicting is always a bad approach. You will be kept busy enough trying to write and maintain software now that there is no good reason to try to satisfy unknown future needs. I have never seen a case where "I'll put this in for later" has led to less headache later on, especially when I put it in by doing something completely wrong. YAGNI!

As to what's wrong with your snippet other than using an entirely wrong approach, angled brackets have a meaning in regex.

Mike Graham
A: 

Though Mike Graham is correct, using regex for xml is not 'recommended', the following will work:

(I have defined searchType as 'd' for numerals)
searchStr = 'port'

if searchType == 'd':
    retPattern = '(<%s>)(\d+)(</%s>)'
else:
    retPattern = '(<%s>)(.+?)(</%s>)'

searchPattern = re.compile(retPattern % (searchStr, searchStr))
found = searchPattern.search(searchStr)
retVal = found.group(2)

(note the complete lack of error checking, that is left as an exercise for the user)

KevinDTimm
Whether this works depends on a gazillion things about the file.
Mike Graham
not really, he owns all parts of this code, he can make it do whatever it wants. I understand the horror of not using an xml parser to handle something that 'looks like' xml. I also understand the OP's point of not wanting the expense of an xml engine for this one little bit. regex works just fine for his problem.
KevinDTimm