Archiving with PDF/A

Archiving-with-PDFA-blog-image

If you’ve dealt with archiving before, you’ve probably heard of PDF/A — a file format meant for recordkeeping. We here at Foxit are big proponents of the file format for long-term document preservation – after all, it is the ISO standard for good reason.

Background

PDF/A is a restriction on the classical PDF file format. Certain rules are enforced – for example, no javascript or executable files; no referencing of non-embedded font types; and no encryption. Others just fall under best practices – for example, image downsampling, while not explicitly prohibited, is not recommended.

These guidelines, though, do not address the broader topic of document creation versus document archiving. If the documents are created in a process unrelated to archiving, from multiple sources of varying resolution and quality; then it’s not advisable to downsample or re-quantize (as in JPEG), since the documents have already been created. In this case, from a strictly archiving perspective, we would not want to introduce any subsequent degradation.

Best Practices

When captured documents (or documents that include images) are first created, the process almost universally involves either downsampling or re-quantization – or both. This can be a factor in hospitals with MRI and CAT scans; or government offices that scan using an MFP to JPEG, TIFF, or JPEG-based PDF – just to name a few examples. If conversion to PDF/A is part of a larger document workflow that includes document creation, then using a JPEG or lossy image compression method wouldn’t violate best practices, since the document archiver in this case is also the document creator. The document creator (or the document creation workflow) always has complete freedom to determine what constitutes informational loss within this application or organization.

This method of combining document creation and archiving into a single step achieves the same results as first capturing the documents with a lossy image format before archiving them in PDF/A. The PDF/A process, on the other hand, directly embeds the JPEG image stream from the capture process. This second method is technically consistent with PDF/A best practices, since archiving is completely lossless — but is equivalent (with respect to output PDF) to the first method of combining document creation and archiving that created a “lossy” PDF/A document as a single process.

Benefits

PDF/A is the worldwide de facto standard, so it is widely used and supported. If you’ve struggled to open old documents before – whether due to file corruption, an unsupported file format, or any other reason – you know how frustrating it is to have an unreliable system. It’s important to be able to access important records, whether for regulatory compliance reasons or other, for years to come. PDF/A is meant to preserve files exactly the same, no matter how many years go by, so it is ideal for long-term storage.

Contact us to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *