views:

111

answers:

4

Does anyone know where I could locate an english language word list in the form of a SQL dump?

I found a word list online but it's a large plain text file; the words are delimited by a new line character. I tried writing a PHP script to loop through the words and insert them in to the database but quickly ran in to memory issues just reading the large file. I've split the file in to 4 smaller files but I'm still getting memory errors. If any one knows how to convert my current file in to a more import friend format, please let me know.

A: 

http://corpora.uni-leipzig.de/download.html

A couple of corpora in different languages (including english) ...

The MYYN
+5  A: 

Use LOAD DATA INFILE. From the docs:

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.

Something like this should work:

LOAD DATA INFILE 'your/path/your_file.txt' INTO TABLE your_table (your_column_name);
Jordan
A: 

Your approach should work fine, you just need to change the way you're reading the file. I'm guessing you're using file_get_contents or something similar to read the whole file in, when you could do it line by line and avoid the memory issues. Try something like fscanf():

$handle = fopen("yourfile.txt", "r");
while ($info= fscanf($handle, "%s\t%s\t%s\n")) {
    list ($field1, $field2, $field3) = $info;
    //... do something with the values
}

fclose($handle);
zombat
A: 

If you're open to using some python in the mix, here's a good how to article:

Ways to process and use Wikipedia dumps

(pulling Wikipedia data (there's your english text) and pushing into a MySQL database)

micahwittman