ansaurus

Question

Anyone know an example algorithm for word segmentation using dynamic programming?

Answer 1

A:

I'm going to assume that we're not talking about the trivial case here (i.e. not just splitting a string around spaces, since that'd just be a basic tokenizer problem) - but instead, we're talking about something were there isn't a clear word delimiter character, and thus we're having to "guess" what the best match for string->words would be - for instance, the case of a set of concatenated words w/o spaces, such as transforming this:

lotsofwordstogether

into this:

lots, of, words, together

In this case, the dynamic programming approach would probably be to calculate out a table where one dimension corresponds to the Mth word in the sequence, and the other dimension corresponds to each Nth character in the input string. Then the value that you fill in for each square of the table is "the best match score we can get if we end (or instead, begin) the Mth word at position N.

Amber 2009-11-23 07:52:55

Answer 2

A:

Introduction to Algorithms, Second Edition

Upul 2009-11-23 08:14:35

Answer 3

A:

take a look at Text mining handbook

Tony Q 2010-10-19 09:49:54

ansaurus

tags:

views:

answers:

Anyone know an example algorithm for word segmentation using dynamic programming?

related questions