Your boss gives you a hardcopy of a company document that needs updating. Your client hands you a printed magazine article and asks you to create an editable text version. You receive an electronic image of a brochure and need to update the text.
What do all these situations have in common? They could all involve you spending hours retyping manually and correcting typos. Or you could take a more modern approach and convert any and all of them into a digital format with fully editable text in a matter of minutes.
All you need is a scanner or digital camera (to create an image file of any printed document) or an electronic image (if you’ve already got a .PDF, .jpg, .eps, .png or similar file, you’re in business), and Optical Character Recognition () software, like the software that comes standard in software.
What is OCR?
OCR is a software technology that enables you to convert scanned document into documents with “live text,” aka readable, searchable text that you can change, copy, edit and basically do anything you regularly do to text.
How does OCR work?
There are two methods used for OCR: Matrix matching (the simpler and more common) and feature extraction.
Matrix Matching compares what your OCR software detects as a character with a library of character templates. When it finds a match, bingo! The OCR software matches that image to its corresponding ASCII character.
Feature Extraction is OCR that uses computer intelligence to look for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. It’s a much more versatile method, but it has more requirements for a successful outcome, such as a clean, straight image and minimum 300-dpi resolution. Matrix matching can still work well on less-than-ideal images and it’s what’s most common in like PhantomPDF.
Advantages of OCR
From faster searches and easier editing to saving digital and physical storage space, you’ll find many benefits to using OCR software to turn document images into searchable, editable text:
- Au revoir retyping – Unless you’re a fan of extra time at the keyboard recreating documents that exist in printed or scanned format, you’ll love the time savings you get when converting those image files into searchable, editable text via OCR.
- Speedy digital searches – By converting scanned text into a word processing file, OCR lets you search through documents using keywords or phrases. Got a few hundred invoices? Let your PC search for the client name you need faster than you can say “coffee break.”
- Typing new text – If you need that image of a document to function like real text, where you can add new paragraphs, copy and paste, edit out an old reference, etc., OCR lets you do it. It’s ideal for everything from updating contracts to making changes to your archive of family recipes.
- Saving space – If you’ve got reams of paper documents taking up space in your office, you can scan them into PDF files with the confidence that your OCR software will let you retrieve any of the text you need to work with, whenever you may need it. Goodbye big file cabinets, hello tidy little CDs of archived documents.
- Accessibility – If you or someone you know is vision-impaired, OCR software can help turn books, magazines and other printed documents into accessible files that they can listen to with the help of a combination of word processing software and computer voice-over utilities.
So why not use the power of OCR in your PDF software to increase efficiency in your office? Once you start using it, you’re guaranteed to find numerous ways to use it. And you’ll wonder how you ever worked without it.
To learn more about how to scan and OCR documents, visit the Foxit PhantomPDF product page.