tags:

views:

454

answers:

6

Hello.

What I am doing is validating URLs from my code. So I have a file with url's in it and I want to see if they exist or not. If they exist, the web page contains xml code in which there will be an email address I want to extract. I go round a while loop and in each instance, if the url exists, The xml is added to a string. This one big string contains the xml code. What I want to do is extract the email address from this string with the xml code in it. I can't use the methods in the string api as they require you to specify the sarting index which I don't know as it varies each time.

What I was hoping to do was search the string for a sub-string starting with (e.g. "<email id>") and ending with (e.g. "</email id>") and add the string between these strings to a seperate string.

Does anyone know if this is possible to do or if there is an easier/different way of doing what I want to do?

Thanks.

+3  A: 

To answer your subject question: .indexOf, or, regular expressions.

But after a brief review of your question, you should really be processing the XML document properly.

Noon Silk
A: 

Have you try to use Regex? Probably a sample document will be very useful for this kind of question.

nanda
+2  A: 

A regular expression that will find and return strings between two " characters:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

private final static Pattern pattern = Pattern.compile("\"(.*?)\"");

private void doStuffWithStringsBetweenQuotes(String source) {
    Matcher matcher = pattern.matcher(source);
    while (matcher.find()) {
        String match = matcher.group(1);
    }
}
Avi
A: 

Check out the org.xml.sax API. It is very easy to use and allows you to parse through XML and do whatever you want with the contents whenever you come across anything of interest. So you could easily add some logic to look for < email > start elements then save the contents (characters) which will contain your email address.

DaveJohnston
+4  A: 

If you know well the structure of the XML document, I'll recommand to use XPath.

For example, with emails contained in <email>[email protected]</email>, there will a XPath request like /root/email (depends on your xml structure)

By executing this XPath query on your XML file, you will automatically get all <email> element (Node) returned in an array. And if you have XML element, you have XML content. (#getNodeValue)

ipingu
A: 
TygerKrash