tags:

views:

106

answers:

4

Hi

just a quick help , i have this line of html -- its ugly:

  <ul><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=userdata&amp;levelFirstItem=0">Zugangsdaten</a></li><li><a href="/web3/setBookingTemplate.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=template&amp;levelFirstItem=1">Buchungsvorlagen</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=showFavorites&amp;levelFirstItem=2">Hotelfavoriten</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=showLightHistory&amp;levelFirstItem=3">Buchungshistorie</a></li><li><a href="/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=showHotelRating&amp;levelFirstItem=4">Hotelbewertung</a></li></ul>

i want to extract this :

/web3/showProfile.do;jsessionid=812E1C87A4FB4184650C551F27ADADAB.6-1?clientId=ZGVfX05FWFQ-&amp;cid=6-1&amp;activity=showFavorites&amp;levelFirstItem=2

The session id is variable How can i do that ..?

+3  A: 

This will capture everything within the quotes for only levelFirstItem=2:

/href="([^"]*levelFirstItem=2)"/
Gavin Miller
there are more than one hrefs ... i want a particular one
n00ki3
How can you distinguish the one you want?
Draemon
levelFirstItem=2
n00ki3
...at the end of the href
n00ki3
@n00ki3 - In your question, you need to distinguish better what you're looking for. As you can see, no one noticed that you were wanting just levelFirstItem=2.
Gavin Miller
well .. in my question i said that i want to extract only this line .. and the session is variable..IT WORKD!.THANKS..next time i will be more explicit :)
n00ki3
+3  A: 

In general, it's better to find an HTML library that will allow you to grab information from HTML. Using regular expressions will get very messy quickly.

What language are you using? I'm sure people here can direct you to a good HTML parsing library for any popular language.

cgyDeveloper
Given the java tag, I assume java :)
Yacoby
What, so you'd parse an entire HTML file just to get at one attribute? I agree that often regexes are abused where real parsers should be used, but without more context, regexes may well be the right tool if the conditions are fairly restrained.
Draemon
D'oh. Not sure how I missed that with 'jsessionid' staring me in the face :)I could help for Ruby, C# or C++, but Java isn't my domain.Anybody have a good recommendation?
cgyDeveloper
@Draemon - You are right, it probably needs more context. But, in my experience, if you're parsing one line of HTML you're probably going to be parsing many. A lot of people just default to RegEx when it's not actually ideal.
cgyDeveloper
A: 
/href="([^"]*)"/

and in Java:

Pattern p = Pattern.compile("href=\"([^\"]*)\"");
Matcher m = p.matcher(line);
if(m.matches()) {
    String href = m.group(1);
}
Draemon
A: 

or possibly /href="(.*?)"/ assuming the regexp engine you're using negates greedy with ?.

bic72