Data capture is the identification and extraction of data from a scanned document, often to be sent to a workflow for routing and action as part of a business process.
Data capture involves and is sometimes confused with optical character recognition (OCR). However, data capture software is more complex and valuable because it captures specific, targeted data – usually from a form – that is required to support a business process. In comparison, OCR is the basic conversion of any scanned alphanumeric information into a machine readable digital form.
The basic capture of data from structured forms is a well-understood process. (A structured form is one where both the type of information and its location on the form are known in advance.) However, most companies also receive a large number of forms such as invoices from other organizations; the relevant data on these forms could be almost anywhere on the page.
In the case of invoices, data capture alone does not identify where the important pieces of information (vendor, address, items, prices, payment terms, and so on) are on the page. And it does not match the invoices with the corresponding purchase orders.
Also, data capture results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor OCR results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data.
Kofax software goes beyond data capture to automate the transformation of business-critical information from paper documents, faxes and electronic formats into process-ready information, and to deliver it into business systems, databases, workflows and document archives. Kofax software also works with document scanners to automatically straighten and improve image quality for even the toughest documents, providing dramatically better data capture results and eliminating the need to rescan documents.
Return to Glossary