tags:

views:

254

answers:

4

This is the text file well a part of it and I want it so I can have it as:

Column 1 = distribution
Column 2 = votes
Column 3 = rank 
Column 4 = title
Column 5 = year 
Column 6 = Subtitle (but only where there is a subtitle)

The regex I'm using is:

regexp = 
    "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";

But as you can tell it doesn't seem to work any ideas on how I may be able to fix it..

1000000103      50   4.5  #1 Single (2006) {THis would be a subtitle example}
2...1.2.12       8   2.7  $1,000,000 Chance of a Lifetime (1986)
11..2.2..2       8   5.0  $100 Taxi Ride (2001)
....13.311       9   7.1  $100,000 Name That Tune (1984)
3..21...22      10   4.6  $2 Bill (2002)
30010....3      18   2.7  $25 Million Dollar Hoax (2004)
2000010002     111   5.6  $40 a Day (2002)
2000000..4      26   1.6  $5 Cover (2009)
.0..2.0122      15   7.8  $9.99 (2003)
..2...1113       8   7.5  $weepstake$ (1979)
0000000125    3238   8.7   Allo  Allo! (1982)
1....22.12       8   6.5   Allo  Allo! (1982) {A Barrel Full of Airmen (#7.7)

CODE IM USING:

    try {
        FileInputStream file_stream = new FileInputStream("/Users/angadsoni/Desktop/ratings-1.txt");
        DataInputStream data_stream = new DataInputStream(file_stream);
        BufferedReader bf = new BufferedReader(new InputStreamReader(data_stream));
        ResultSet rs;
        Statement stmt;
        Connection con = null;
        Class.forName("org.gjt.mm.mysql.Driver").newInstance();
        String url = "jdbc:mysql://localhost/mynewdatabase";
        con = DriverManager.getConnection(url,"root","");
        stmt = con.createStatement();
  try{
    stmt.executeUpdate("DROP TABLE myTable");
  }catch(Exception e){
    System.out.print(e);
    System.out.println("No existing table to delete");

    //Create a table in the database named mytable
  stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," + "votes integer," + "rank float," + "title char(250)," + "year integer," + "sub char(250));");
 String rege= "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?";
  Pattern pattern = Pattern.compile(rege);
  String line;
  String data= "";
  while ((line = bf.readLine()) != null) {
    data = line.replaceAll("'", "");

Matcher matcher = pattern.matcher(data);

    if (matcher.find()) {
        System.out.println("hello");
        String distribution = matcher.group(1);
        String votes = matcher.group(2);
        String rank = matcher.group(3);
        String title = matcher.group(4);
        String year = matcher.group(5);
        String sub = matcher.start(6) != -1 ? matcher.group(6) : "";
        System.out.printf("%s %8s %6s%n%s (%s) %s%n%n",
        matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5),
        matcher.start(6) != -1 ? matcher.group(6) : "");
        String todo = ("INSERT into mytable " +
            "(Distribution, Votes, Rank, Title, Year, Sub) "+
            "values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"', '"+year+", '"+sub+"');");
        int r = stmt.executeUpdate(todo);
    }//end if statement
  }//end while loop
}
A: 

My first thought is that it's perhaps easier to split the first few fields using whitespace and StringTokenizer, and then use a regexp for the remaining 3 fields. That way you're going to simplify the regexp required.

Brian Agnew
It's even easier with split(): `String[] parts = s.split("\\s+", 4);`
Alan Moore
+1  A: 

there might be further problems, but the first hurdle is that the backslashes don't make it to the regex machine. you need to double them.

just somebody
A: 

I was trying to come up with a regex for the part starting from the title and similar to you came up with

(.*)\\s+(\\([0-9]{4}\\))\\s+(.*$)

Maybe you could provide some more code to clarify what exactly you're doing with the regex? Also, was there a problem with this answer?

A: 

This regex works correctly with the data you provided:

^([\d.]+)\s+(\d+)\s+([\d.]+)\s+(.+?)\s+\((\d+)\)(?:\s+\{([^{}]+))?

If there's no subtitle, the final group (group #6) will be null.

EDIT: Here's a complete example:

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Test
{
  public static void main(String[] args) throws Exception
  {
    Pattern p = Pattern.compile(
      "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?"
    );
    Matcher m = p.matcher("");
    Scanner sc = new Scanner(new File("test.txt"));
    while (sc.hasNextLine())
    {
      String s = sc.nextLine();
      if (m.reset(s).find())
      {
        System.out.printf("%s %8s %6s%n%s (%s) %s%n%n",
            m.group(1), m.group(2), m.group(3), m.group(4), m.group(5),
            m.start(6) != -1 ? m.group(6) : "");
      }
    }
  }
}

partial output:

1000000103       50    4.5
#1 Single (2006) THis would be a subtitle example

2...1.2.12        8    2.7
$1,000,000 Chance of a Lifetime (1986)

...etc.

Alan Moore
unofortunately it doesn't seem to work for me
angad Soni
@angad: It works for me; see my edit.
Alan Moore
@alan yeah it prints it out just fine. but here is an issue that i'm having trouble with maybe you can help i'm saving the values in a varible that is the value of each m.group() in a seperate variable then inserting it into a table in mysql ill post my code up to maybe it can be useful and then you can tell me where i'm going wrong.
angad Soni
But the *regex* does what you wanted it to, right? Each of the capture groups contains what it's supposed to? If so, then I've answered your original question. If you want to know how to transfer the data from those capture groups to database columns, that's a separate question (but @BalusC has already answered that one: http://stackoverflow.com/questions/2360418/would-a-regex-like-this-work-for-these-lines-of-text/2363260#2363260 ).
Alan Moore