A General OCR Question - Post ID 2049

User 458800 Photo


Registered User
78 posts

Greetings!

I'm not sure this is the proper forum section for this question, but here goes anyway.

Does anybody know of a tool that I could use to read a PDF or scanned document and produce a "text searchable" PDF on the other end?

The source material is a Homeowner's Association Bylaws document that was created in the early 1990's...on a typewriter. Since then, it's been scanned and we have it available as a PDF download. But it's kind of like one massive "gif" or picture document, and therefore not text searchable.

I want to change that; make it possible for our homeowners to do searches on the content.

The document probably runs on for 50 pages or so, so manually retyping it would not be desirable.

The bylaws document contains both text and pictures. I could rescan the pictures if necessary, and manually place them in the correct locations of the new document.

But what I really need is a good tool that can recognise that old "IBM Selectric Courier" typewriter font (for the major "body" of the document), that would really help reduce the manual work.

I'm on a budget for this project. Open source would be best, a bargain next-best. I have a three or four year old Epson 4490 scanner, but I don't see any OCR (Optical Character Recognition) software on the Epson download page. Still digging, though.

Thanks in advance for any ideas you may have...
Windows 11
Intel i9 (workstation) and i9 (laptop). Gobs of RAM and acres of SSD space on both machines.
User 244141 Photo


Ambassador
1,209 posts

Hi,

Use OpenOffice 3.x and download the pdf extension and install it. That will allow you to edit and publish PDF's. Most PDF readers have a text search built in so with the PDF you make that shouldn't be a factor.
Web Design: https://www.websnoogie.com
Member - BBB: Websnoogie, LLC






User 458800 Photo


Registered User
78 posts

Thanks, I am downloading 3.0 now. I was on 2.4; not aware of a new version.

I can't find any info on whether or not it can optically translate characters (that appear in pictures) into typeset, but I only just started looking.
Windows 11
Intel i9 (workstation) and i9 (laptop). Gobs of RAM and acres of SSD space on both machines.
User 244141 Photo


Ambassador
1,209 posts

You should be able to open the PDF and edit directly or print it and scan it back in(maybe a option). If CoffeeCup could just do this desktop app....lol

:)

PS: Most newer printers have OCR software included. Just select something like the "read document" option.
Web Design: https://www.websnoogie.com
Member - BBB: Websnoogie, LLC







Have something to add? We’d love to hear it!
You must have an account to participate. Please Sign In Here, then join the conversation.