I need to detect the bounding box(es) around portions of text in an image, and while there are quite a number of scholarly articles describing algorithms, I haven't found any implementations.
The specific problem I'm trying to solve is this:
Given an image that may or may not contain text, determine if the image does contain text, an...
Hi,
For a contract work, I need to digitalize a lot of old, scanned-graphic-only plenary debate protocol PDFs from the Federal Parliament of Germany.
The problem is that most of these files have a two-column format:
I would love to read your answer to my following questions:
How I can split the two columns before feeding them into...
According to Wikipedia, "The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents." However, it gives no citation.
My question is: is this true? Is the current state-of-the-art so good that - for a good sca...
Given a region defined by a rectangle and a url, is there any way to determine what elements lie within the given rectangle on the page at the given url?
EDIT: Screen resolution, Font size, etc.. can all be set to reasonable defaults.
...
Consider a *NIX executable, dvi2rtf, whose contents are:
#!/bin/sh
TMPX=`mktemp /tmp/dvi2rtf.XXXXXX`
dvitty $1 $TMPX # CTAN
txt2rtf $TMPX $2 # CTAN, in rtfutils
If my head is working this morning and the right executables are on the PATH, this clobbers the second argument with an rtf file whose text contents will roughl...