This sounds similar to the problem of tokenising Chinese, where there are no spaces in between words. This paragraph is taken from 'Introduction to Information Retrieval' by Manning, Raghavan & Schütze, available online here:
This phenomenon reaches its limit case
with major East Asian Languages (e.g.,
Chinese, Japanese, Korean, and Thai),
where text is written without any
spaces between words. [...] One approach
here is to perform word segmentation
as prior linguistic processing.
Methods of word segmentation vary from
having a large vocabulary and taking
the longest vocabulary match with some
heuristics for unknown words to the
use of machine learning sequence
models, such as hidden Markov models
or conditional random fields, trained
over hand-segmented words
I would suggest greedy dictionary matching as a first step, then adding heuristics to handle the most common failure cases.