views:

411

answers:

4

hi

somewhat confused.. but trying to do a search/repace using wildcards

if i have something like:

 <blah.... ssf  ff>
 <bl.... ssf     dfggg   ff>
 <b.... ssf      ghhjj fhf>

and i want to replace all of the above strings with say,

 <hh  >t

any thoughts/comments on how this can be accomplished?

thanks

update (thanks for the comments!)

i'm missing something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney 
Soo Choi</span>LONGEDITBOX">Joel Franks 
Joel Franks</span>GEDITBOX">Alexander Yamato 

and i'm trying to get

Soo Choi foo Apryl Berney 
Soo Choi foo Joel Franks 
Joel Franks foo Alexander Yamato 

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name) 

but i'm missing something...

thoughts... thanks

+1  A: 

See the rather usable Python Regular Expression manual here, or for a more hands-on approach a Regular Expression HOWTO section 5.2 Search and Replace.

Bandi-T
Regex is the easy way here. `s/<[^>]*>/<hh >t/g`
Anon.
A: 

don't have to use regex

for line in open("file"):
    if "<" in line and ">" in line:
        s=line.rstrip().split(">")
        for n,i in enumerate(s):
            if "<" in i:
                ind=i.find("<")
                s[n]=i[:ind] +"<hh "
        print '>t'.join(s)

output

$ cat file
blah  <blah.... ssf  ff> blah
blah <bl.... ssf     dfggg   ff>  blah <bl.... ssf     dfggg   ff>
blah <b.... ssf      ghhjj fhf>

$ ./python.py
blah  <hh >t blah
blah <hh >t  blah <hh >t
blah <hh >t
ghostdog74
This is a good quick and dirty solution, but also not very extensible because of that; it also does not check the `b` after the `<` - although it is not clear whether that was a requirement by the OP. With regex he will have a much more versatile tool in his hands.
Bandi-T
yes, i agree. without much other info from OP. also, if OP is really parsing complect HTML(or XML?), even regex is advised not to be used :)
ghostdog74
A: 

Sounds like a job for the "re" module, here's a little sample function for you although you could just use the one re.sub() line.

Use the "re" module, a simple re.sub should do the trick:

import re

def subit(msg):
    # Use the below if the string is multiline
    # subbed = re.compile("(<.*?>)" re.DOTALL).sub("(<hh  >t", msg)
    subbed = re.sub("(<.*?>)", "<hh  >t", msg)
    return subbed

# Your messages bundled into a list
msgs = ["blah  <blah.... ssf  ff> blah",
        "blah <bl.... ssf     dfggg   ff>  blah <bl.... ssf     dfggg   ff>",
        "blah <b.... ssf      ghhjj fhf>"]

# Iterate the messages and print the substitution results
for msg in msgs:
    print subit(msg)

I would suggest taking a look at the docs for the "re" module, it is well documented and might help you achieve more accurate text manipulation/replacement.

AWainb
A: 

How about like this, with regex

import re

YOURTEXT=re.sub("<b[^>]*>","<hh >t",YOURTEXT)
S.Mark