Page 1 of 1

Has this been posted: (Google hires nutty right winger) ?

Posted: Wed May 25, 2005 9:26 pm
by rep

Posted: Wed May 25, 2005 10:39 pm
by Massive Quasars

Posted: Wed May 25, 2005 11:34 pm
by rep
OCR is incredible, but there needs to be a blend between OCR and the original document scan...

What I mean by this is, instead of scanning something in and then converting it to text using computer fonts, which can at best only approximate the page if it was done in an easily read typeface, it needs to tag each character and leave it at that.

That way, if you are looking at the Declaration of Independence, you can search for John and John Hancock's signature is selected.

If a book is rudimentary text, then leave it at that and to save space let the OCR program convert it completely to font based computer text.

If it's highly illustrated, historical, or uses interesting layouts and typefaces then leave it as an image with tagged characters, much like an HTML image map.