Has this been posted: (Google hires nutty right winger) ?

Open discussion about any topic, as long as you abide by the rules of course!
Post Reply
rep
Posts: 2910
Joined: Fri Aug 30, 2002 7:00 am

Has this been posted: (Google hires nutty right winger) ?

Post by rep »

[img]http://members.cox.net/anticsensue/rep_june.gif[/img]
Massive Quasars
Posts: 8696
Joined: Fri Dec 15, 2000 8:00 am

Post by Massive Quasars »

[url=http://www.marxists.org/][img]http://img442.imageshack.us/img442/3050/avatarmy7.gif[/img][img]http://img506.imageshack.us/img506/1736/leninzbp5.gif[/img][img]http://img506.imageshack.us/img506/1076/modulestalinat6.jpg[/img][img]http://img506.imageshack.us/img506/9239/cheds1.jpg[/img][/url]
rep
Posts: 2910
Joined: Fri Aug 30, 2002 7:00 am

Post by rep »

OCR is incredible, but there needs to be a blend between OCR and the original document scan...

What I mean by this is, instead of scanning something in and then converting it to text using computer fonts, which can at best only approximate the page if it was done in an easily read typeface, it needs to tag each character and leave it at that.

That way, if you are looking at the Declaration of Independence, you can search for John and John Hancock's signature is selected.

If a book is rudimentary text, then leave it at that and to save space let the OCR program convert it completely to font based computer text.

If it's highly illustrated, historical, or uses interesting layouts and typefaces then leave it as an image with tagged characters, much like an HTML image map.
[img]http://members.cox.net/anticsensue/rep_june.gif[/img]
Post Reply