Sorry for the difficult question.
I have a large set of sequences to be corrected by either/or adding digits or replacing them (never removing anything) that looks like this:
- 1,2,,3 => 1,7,4,3
- 4,,5,6 => 4,4,5,6
- 4,7,8,9 => 4,7,8,9,1
- 4,7 => 4,8
- 4,7,1 => 4,7,2
It starts with a padded original sequence, and a sample correction.
I'd like to be able to work on correcting the sequences automatically by calculating the frequencies of the different n-grams being corrected, the first sample would become
- 1=>1
- 2=>7
- 3=>3
- 1,2=>1,7
- 2,3=>7,4,3
- 1,2,3=>1,7,4,3
I'd collect the frequency of these n-grams corrections, and I'm looking for a way to calculate the best way to correct a new input that may or may not be in the sample data.
This seems to be similar to SMT.