By Guest Blogger: Jim Wanner, CEO, KeyMark, Inc.
Bad implementations, false positives, slow implementations – sound familiar? If you’re familiar with the document
capture industry you could say these are buzzwords that describe many early implementations. Fortunately the situation
is taking a turn for the better.
Document capture is a dynamic industry that is on the cutting-edge of going mainstream.
Hardware and software enhancements have created a nirvana that will benefit both customers and solution providers. Let’s take a look at where we’re heading with document capture.
Decentralized scanning will continue to grow
No matter how hard we try to contain scanning in a centralized mailroom, the concept of scanning remotely is here to
stay. The reasons why are very clear.
• Hardware prices have plummeted. You can now purchase a reliable scanner for the cost of a printer.
• Multifunction devices are now usable. For years, multifunction devices were for printing only. Now the device
is opening up a brand new opportunity that will change the future – the ability to scan. The hardware equipment
has improved its scanning capabilities and the devices are now powered by software that automates document
ingestion.
• Software to capture remote scanning has exponentially improved. In the past, if you didn’t scan with a highquality,
centralized device the image quality would suffer. The hardware has improved, but the improvement
came with the advent of new software capabilities. Image cleanup is now a basic technology available to
everyone.
• Simple features that we take for granted such as image scanning confirmation, hardware integration, and
enterprise solutions are now part of the core repertoire of all high-end, remote scanning applications.
Optical Character Recognition (OCR) technology will be mainstream
OCR technology used to have the stigmatism of having a very high percentage of failed implementations and producing
far too many false reads (identifying a character for another character). If you don’t have the right people implementing
the solution and you do not have a clean ability to deal with exception handling, you will still run into significant
implementation problems even with the best technology.
• Select the right product. There are many products in the market today that can scan and OCR documents. The
key is finding the correct product that meets your needs and is easy to configure and can handle exceptions.
• Ensure you have integration with the core systems to handle lookups. No OCR system is perfect since most of
them utilize the same OCR engines. Consequently, the tool’s ability to link back to the original information to
validate data is crucial to minimizing false reads.
• Have the proper methods to deal with exceptions. The majority of significant issues that can happen after an
OCR system implementation revolve around bad data being uploaded into the legacy system. Consequently,
you must ensure you have the proper method of dealing with exceptions easily.
Separation and classification will exponentially improve
15 years ago a great install would have utilized patch pages, achieved a 90% read rate for machine-printed documents and
70% for hand-printed documents and the forms could not be altered.
This is not the case today. You should now expect dramatic improvements when you implement a separation and
classification solution. Separation is accurately knowing when on document ends and the next document in a batch
begins. Classification accurately identifies the documents in a batch of related documents.
The most significant technical problem we face today is gathering enough samples so the OCR software that reads the
letters and numbers off the document can effectively identify the various document types. This problem will be
minimized in the future by the software only requiring a few samples to identify documents. The end result is a shorter
implementation time measured in weeks, not months, and accuracy will improve to greater than 90%. You will also
reduce preparation time and have no need for patch pages.
Regulation will push capture technology to quickly identify document types
Years ago, records managers spent a majority of their time making sure documents were filed in the correct folders.
Today, records managers are busy providing corporate legal counsel due to government regulation and the increased
awareness of important data.
Companies need a way to quickly identify all different types of data on the fly without the need for a human to review the
results. The competitive landscape will be fierce. Capture technology will be competing with search engines for this
market. The document types that need to be identified vary from scanned documents and electronic documents to
emails. The winner in this space will be corporations establishing a records management policy.
Business Process Management (BPM) will be incorporated with capture
Capture companies are just now integrating BPM technology to capitalize on these benefits. The benefits are significant in multiple ways.
• Data is far more important to companies today than it was 10 years ago. This trend will continue for years to
come. In certain industries, such as the mortgage space, the process of confirming the quality and accuracy of the
data is vital compared to how lenders processed mortgages 10 years ago.
• Companies are bringing knowledge-level decisions to the capture platform. Instead of waiting until after the
process is completed, companies want to ensure the data is accurate and catch errors prior to processing the
data.
• Standardization of knowledge is documented in the BPM solution. Smart companies are implementing rules into
their capture platform so they can select the good business from the business they would prefer not to have on
their books. Capture integration with BPM gives corporations this flexibility to make quick and accurate
decisions.
Over the next couple of years, we’ll see ECM companies continue to push their capture solutions and the search engine
organizations will enter the market with similar technology. More information will be accurately identified on a real-time
basis. We will all benefit greatly from the improvements since catching errors will improve the quality of our
organizations.