Hybrid Records: Optimizing Hybrid Document Capture

What is a Hybrid Record?

A hybrid document or hybrid record is a file whose data is stored in multiple file formats within the same document. Hybrid records are a specific form of a metafile, or a file format capable of storing multiple types of data. For example, think of a contract in an electronically generated PDF format that contains a scanned image of the signature page, or an image-PDF with a CSV file embedded within the document. Hybrid records function essentially as container files; they carry data in myriad formats throughout one document, acting as vehicles for information.

Hybrid PDFs and ISO Specifications

A commonly-used metafile format is PDF/A, the universal industry standard for archiving and document accessibility. PDF/A possesses nuanced file embedding compatibilities that should be addressed if you are considering its usage as a hybrid document. While PDF/A-2 brought specifications that allowed for embedding of only valid PDF/A files, the addition of PDF/A-3 permits embedding of any file type. Supported file types include CAD, CSV, XML, and image files. This ISO specification allows specifically for the format of PDF/A-3 to serve as a hybrid document.

Industries Adopting Hybrid Records

The hybrid record as a file format is commonly deployed across many industries. For instance, XML files are often attached within PDF files for the sake of electronic invoicing and billing; CSV files are embedded within financial documents; and CAD drawings are placed within PDFs for the manufacturing and engineering industries. A salient specific usage of hybrid documents can be found in the health industry: many healthcare institutions are pivoting to rely on “hybrid health records,” or HHRs. As collations of electronic health records (EHR) and paper charts from scanned document images, the HHR is easing the transformation to paperless, digitally-dominant workflows in the medical industry. HHRs help to alleviate costs of physical paper storage, add the convenience of assembling image-based data within one document, and aid in the transition to paperless document management.

Optimizing Document Capture Processes for Hybrid Records

The main boon of creating, using, and distributing hybrid documents is expediency. As a compound file format, multiple streams of data can be contained within one document file. However, merging file types does come with a trade-off: it can be difficult to process image and electronic components together without support from the right capture software, which can be designed to only handle image documents such as scanned paper. Being unable to process the electronic components along with the image portions of a hybrid record slows down document processes, adds manual exception handling expenses to your bottom line, and disrupts workflows. An ideal document capture software should rely upon built-in support for an array of file types to contain hybrid document processes within a singular workflow. Foxit’s PDF Compressor, for instance, comes equipped with automatic processing support for many input file types, like XFA PDF forms and hybrid PDFs. Expanding input file type compatibility can greatly reduce the amount of time spent manually processing document types that otherwise cannot be ingested by many capture solutions, enables faster business workflows, and minimizes missed revenue from document-based transactions.

Data extraction and analytics, archiving, and forms processing of hybrid documents are only as accurate as the data inputted into them. A conversion software will often rasterize all images regardless of its source. In the process, these solutions will flatten text to image and re-OCR already-indexable text from digital-born documents — wasting time unnecessarily, decreasing performance, and exposing the hybrid record to inaccuracy from re-recognition errors. Foxit’s PDF Compressor is designed to uniquely discern pre-existing text layers from born-digital documents, and automatically bypass the OCR phase for these portions of text. Especially useful for hybrid documents with born-digital text content, PDF Compressor mitigates any risks of inaccurate OCR and allows these documents to be properly utilized as data-rich assets. Furthermore, unattended auto-detection of electronically-produced text relieves your company of manual effort, special coding, or expensive professional services for sorting and laboriously separating born-digital files from image documents.

