views:

316

answers:

6

I have multiple strings that are in the following format:
12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]

From these string I need to get out the date, time, first and last name of the person, and the card number. The word admitted can be omitted and anything following the final digit of the card number can be ignored.
I have a feeling I want to use StringTokenizer for this, but I'm not positive.
Any suggestions?

A: 
bhups
Thanks but using StringTokenizer, how would I break the string up?
clang1234
I've edited the answer with the same :)
bhups
+3  A: 

The String Tokenizer is great when you have a common delimiter, but in this case I'd opt for regular expressions.

Jens Schauder
+1 for a regex.
Ross
So as an example for drawing out the date from the string, I'm trying the following: Pattern datePattern = Pattern.compile( "[0-9]{2}/[0-9]{2}/[0-9]{4}" ); Then using Matcher on the string, with that pattern, I get no result. How would I properly format this regular expression?
clang1234
Trial and error: http://www.regexplanet.com/simple/
Nick Veys
+2  A: 

I'd go for java.util.Scanner... this code will get you started... you should really use the Pattern form of the scanner methods rather then the String form that I used.

import java.util.Scanner;

public class Main
{
    public static void main(String[] args)
        throws Exception
    {
        final String  str;
        final Scanner scanner;
        final String  date;
        final String  time;
        final String  word;
        final String  lastName;
        final String  firstName;

        str       = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]";
        scanner   = new Scanner(str);
        date      = scanner.next("\\d+/\\d+/\\d+");
        time      = scanner.next("\\d+:\\d+:\\d+");
        word      = scanner.next();
        lastName  = scanner.next();
        firstName = scanner.next();
        System.out.println("date : " + date);
        System.out.println("time : " + time);
        System.out.println("word : " + word);
        System.out.println("last : " + lastName);
        System.out.println("first: " + firstName);
    }
}
TofuBeer
+2  A: 

Your record format is simple enough that I'd just use String's split method to get the date and time. As pointed out in the comments, having names that can contain spaces complicates things just enough that splitting the record by spaces won't work for every field. I used a regular expression to grab the other three pieces of information.

public static void main(String[] args) {
    String record1 = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]";
    String record2 = "12/18/2009 02:08:26 Admitted Van Halen, Eddie (Card #222) at South Lobby [In]";
    String record3 = "12/18/2009 02:08:26 Admitted Thoreau, Henry David (Card #333) at South Lobby [In]";

    summary(record1);
    summary(record2);
    summary(record3);
}

public static void summary(String record) {
    String[] tokens = record.split(" ");

    String date = tokens[0];
    String time = tokens[1];

    String regEx = "Admitted (.*), (.*) \\(Card #(.*)\\)";
    Pattern pattern = Pattern.compile(regEx);
    Matcher matcher = pattern.matcher(record);
    matcher.find();

    String lastName = matcher.group(1);
    String firstName = matcher.group(2);
    String cardNumber = matcher.group(3);

    System.out.println("\nDate: " + date);
    System.out.println("Time: " + time);
    System.out.println("First Name: " + firstName);
    System.out.println("Last Name: " + lastName);
    System.out.println("Card Number: " + cardNumber);
}

The regular expression "Admitted (.*), (.*) \\(Card #(.*)\\)" uses grouping parentheses to store the information you're trying to extract. The parentheses that exist in your record must be escaped.

Running the code above gives me the following output:

Date: 12/18/2009
Time: 02:08:26
First Name: John
Last Name: Doe
Card Number: 111

Date: 12/18/2009
Time: 02:08:26
First Name: Eddie
Last Name: Van Halen
Card Number: 222

Date: 12/18/2009
Time: 02:08:26
First Name: Henry David
Last Name: Thoreau
Card Number: 333
Bill the Lizard
Nice, but this breaks for names with spaces in them. For example "Van Halen, Eddie"
Adriaan Koster
@Adriaan: Thanks for pointing that out. Real world data is such a pain sometimes! :) I changed my code to use regular expressions to pull out those pieces of data that were affected by the spaces in names.
Bill the Lizard
Thanks Bill. This worked perfectly.
clang1234
Great answer. Might post a variant later on.
James P.
+1  A: 

A few things to keep in mind while you are parsing this line:

  • Last names can have spaces so you should be looking for ,
  • First name could have a space so look for the (

Due to this I would work off of TofuBeer's answer and adjust the next for first and last name. The string split is gonna be messy due to the extra spaces.

Jeff Beck
A: 

Shortest regexp solution (with type casting):

String stringToParse = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In] ";
Pattern pattern = Pattern.compile("((\\d{2}/){2}\\d{4}\\s(\\d{2}:){2}\\d{2})\\s(\\w+)\\s((.*)),\\s((.*))\\s.*#(\\d+)");
Matcher matcher = pattern.matcher(stringToParse);
matcher.find();

String firstName = matcher.group(6);
String lastName = matcher.group(5);
int cardNumber = Integer.parseInt(matcher.group(7));

DateFormat df = new SimpleDateFormat("MM/dd/yyyy HH:mm:ss");
Date date = df.parse(matcher.group(1));
Dmitry Nikolaev