views:

286

answers:

2

Hello,

I am trying to write a small java program that will accept a file (using Scanner class), return the file as a String, and then search that string for any instance of a substring starting with "Email:" and ending with ".edu". There will be many instances of this substring, each of which I want to parse out into an array or a new file.

I know how to find a substring, but I do not know how to A) search for all instances of the substring and B) specify the start AND finish of the substring.

Can someone help me with this logic?

Thanks!

+2  A: 

You could use indexOf(). I think you can tell it where to search from too. So to find your instances of "Email:":

while(index < input.size()){
  substringLocation = input.indexOf("Email:", index);
  // do something with substring
  index = substringLocation;
}
Brendan Long
+1  A: 

This sounds like a case for regular expressions to me:

import java.util.regex.*;

public class Test
{
    private static final Pattern EMAIL_PATTERN = Pattern.compile
        ("Email:(.*?\\.edu)");

    public static void main(String[] args)
    {
        String testString = "FooEmail:[email protected] Bar Email:[email protected] Baz";

        printEmails(testString);
    }

    public static void printEmails(String input)
    {
        Matcher matcher = EMAIL_PATTERN.matcher(input);
        while (matcher.find())
        {
            System.out.println(matcher.group(1));
        }
    }
}

Note that you'll get strange results if you have any non .edu emails in there... for example, if you have "Email: [email protected] Email: [email protected]" you'd end up with a match of "[email protected] Email: [email protected]".

Jon Skeet
This is very helpful to me, thank you! I'm going to have to learn more about regular expressions - they seem like they can do powerful things!
behrk2
@behrk2: Regular expressions are great in their place - which is pattern matching. They can easily be overused though. In this case they're a good fit, but don't try to use them for *all* string manipulation tasks... there are often simpler ways.
Jon Skeet
Stand back, I know regular exceptions !
Valentin Rocher
If there is always a space between two email addresses, you could prevent the problem noted in the last sentence by modifying the regular expression to `Email:(\\S+\\.edu)`.
Customizer