tags:

views:

39

answers:

1

I'm trying to pull data from Jim Breen's WWWJDIC. The raw data returned has a lot of information delimited in several different formats.

The data pulled in the example below is from here: http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1ZUJ%E5%85%88%E7%94%9F

先生 [せんせい] /(n) (1) teacher/master/doctor/(suf) (2) with names of teachers, etc. as an honorific/(P)/

Should I use a regex?

A: 

Regex could work here; the data seems to be returned in a simple "headword [kana] /definition/" format, where definition can also contain slashes. You should be aware that certain entries will omit "[kana]" (try searching for ハンバーグ for example.

Also, you should know the raw dictionary file that WWWJDIC uses is also available for download here: http://www.csse.monash.edu.au/~jwb/edict.html. It may possibly fit your needs better.