I have multiple strings that are in the following format:
12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]
From these string I need to get out the date, time, first and last name of the person, and the card number. The word admitted can be omitted and anything following the final digit of the card number can be ignored.
I have a feeling I want to use StringTokenizer for this, but I'm not positive.
Any suggestions?
views:
316answers:
6The String Tokenizer is great when you have a common delimiter, but in this case I'd opt for regular expressions.
I'd go for java.util.Scanner... this code will get you started... you should really use the Pattern form of the scanner methods rather then the String form that I used.
import java.util.Scanner;
public class Main
{
public static void main(String[] args)
throws Exception
{
final String str;
final Scanner scanner;
final String date;
final String time;
final String word;
final String lastName;
final String firstName;
str = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]";
scanner = new Scanner(str);
date = scanner.next("\\d+/\\d+/\\d+");
time = scanner.next("\\d+:\\d+:\\d+");
word = scanner.next();
lastName = scanner.next();
firstName = scanner.next();
System.out.println("date : " + date);
System.out.println("time : " + time);
System.out.println("word : " + word);
System.out.println("last : " + lastName);
System.out.println("first: " + firstName);
}
}
Your record format is simple enough that I'd just use String's split method to get the date and time. As pointed out in the comments, having names that can contain spaces complicates things just enough that splitting the record by spaces won't work for every field. I used a regular expression to grab the other three pieces of information.
public static void main(String[] args) {
String record1 = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]";
String record2 = "12/18/2009 02:08:26 Admitted Van Halen, Eddie (Card #222) at South Lobby [In]";
String record3 = "12/18/2009 02:08:26 Admitted Thoreau, Henry David (Card #333) at South Lobby [In]";
summary(record1);
summary(record2);
summary(record3);
}
public static void summary(String record) {
String[] tokens = record.split(" ");
String date = tokens[0];
String time = tokens[1];
String regEx = "Admitted (.*), (.*) \\(Card #(.*)\\)";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(record);
matcher.find();
String lastName = matcher.group(1);
String firstName = matcher.group(2);
String cardNumber = matcher.group(3);
System.out.println("\nDate: " + date);
System.out.println("Time: " + time);
System.out.println("First Name: " + firstName);
System.out.println("Last Name: " + lastName);
System.out.println("Card Number: " + cardNumber);
}
The regular expression "Admitted (.*), (.*) \\(Card #(.*)\\)"
uses grouping parentheses to store the information you're trying to extract. The parentheses that exist in your record must be escaped.
Running the code above gives me the following output:
Date: 12/18/2009
Time: 02:08:26
First Name: John
Last Name: Doe
Card Number: 111
Date: 12/18/2009
Time: 02:08:26
First Name: Eddie
Last Name: Van Halen
Card Number: 222
Date: 12/18/2009
Time: 02:08:26
First Name: Henry David
Last Name: Thoreau
Card Number: 333
A few things to keep in mind while you are parsing this line:
- Last names can have spaces so you should be looking for ,
- First name could have a space so look for the (
Due to this I would work off of TofuBeer's answer and adjust the next for first and last name. The string split is gonna be messy due to the extra spaces.
Shortest regexp solution (with type casting):
String stringToParse = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In] ";
Pattern pattern = Pattern.compile("((\\d{2}/){2}\\d{4}\\s(\\d{2}:){2}\\d{2})\\s(\\w+)\\s((.*)),\\s((.*))\\s.*#(\\d+)");
Matcher matcher = pattern.matcher(stringToParse);
matcher.find();
String firstName = matcher.group(6);
String lastName = matcher.group(5);
int cardNumber = Integer.parseInt(matcher.group(7));
DateFormat df = new SimpleDateFormat("MM/dd/yyyy HH:mm:ss");
Date date = df.parse(matcher.group(1));