How good is the OCR text recognition in your document?

by Laura Silva, Marketing Program Manager

ocr-text-recognition-website

If you scan documents, you probably know that the typical scanning software just produces an image of your document. It doesn’t create editable, searchable text. That’s fine if all you want to do is digitize the file. If, however, you want to create intelligent documents that you can edit, reuse, search and make findable in your archive, you’ll want something more. That’s where Optical Character Recognition (OCR) comes in.

OCR converts scans into editable, searchable documents, but there’s a catch

OCR turns your scanned documents into editable and searchable ones by converting static images of words into real, searchable text.

Trouble is, while today’s OCR engines are quite sophisticated and do their best to detect the characters in a document, by nature, the recognition is never 100%. Often, it depends on the quality of your scanned document as well as the OCR software and a number of other variables. Which leaves you needing to check your resulting document after you OCR it.

Of course, if you’ve got to review any document you perform OCR on, you’ll want to do that quickly so you can move on to other tasks. Here’s how to find out what, if anything, needs fixing.

Turn “hidden” OCR text into text you can view

Technically, OCR text is also called “hidden text” in a PDF as you typically see the image in

PDF editors and the OCR text is “underlying” or “sitting behind” the image.

If you’re like most users, your first instinct is to copy-and-paste text from the resulting PDF file into Word in order to read or edit it. While that works, there’s a better way.

Foxit PhantomPDF offers a handy feature that lets you stay in the PDF editor and just click to see the OCR text. Simply click Text Viewer.

With Foxit Text Viewer, you can work on all PDF documents in pure text view mode. It allows you to easily reuse the text scattered among images and tables, and also acts like Notepad.

foxit-reader-text-viewer

To Enter Text View mode, do one of the following:

  • Choose View Document Views Text Viewer
  • Press the shortcut key Ctrl 6.

Then, once you can view the OCR’d text, if you want to correct OCR errors, you can do so using PhantomPDF to edit anything that needs to be changed.

This method works well for the occasional scan you’re converting to OCR. If, however, you need a high OCR recognition rate for a big volume of documents, we can help your organization perform very high quality OCR at scale. Talk to us about Foxit Server solutions.


Leave a Reply

Your email address will not be published. Required fields are marked *


*