Unleashing the Value of Unstructured Data for Federal Agencies Through Automation
Robotic process automation (RPA) is beginning to blossom across federal agencies. Many federal leaders now acknowledge the considerable value that comes with automating low-value tasks and empowering their staff to focus on higher-value, more strategic work. Thanks to software robots, data can be validated between systems of record; rote, manual data entry can be automated; and data can flow between applications that are otherwise stove-piped.
RPA tools reliably manage and execute the critical, countless data-centric tasks—interacting with websites, business and desktop applications, databases and people—that would otherwise be performed manually. By automating this manual work, RPA and automation tools empower human employees to shift from being data handlers to data consumers. For federal employees, this translates into more time spent on strategic business initiatives, better constituent service and higher job satisfaction.
But what about managing all the unstructured and semi-structured data that often resides in documents and is a critical component for agencies looking to better exploit their data for better insights and decision-making? RPA by itself cannot handle these tasks; it requires complementary technologies to acquire and process this data.
Fortunately, the ability to exploit data and automate the ingest is available with cognitive document automation (CDA). CDA is enabled by artificial intelligence (AI), and while it leverages optical character recognition (OCR), it is much more than that. CDA is a suite of capabilities that gives users the power to quickly and accurately digitize and extract data from any document. With CDA, agencies can automatically ingest, analyze and make decisions based on documents such as benefits applications, loan applications, grant proposals, personnel forms, emails, memos, tax forms, insurance claim forms, onboarding documents, invoices, contract proposals and many others.
The potential applications for CDA are applicable across many programs in government. CDA performs the work of understanding what a document is about, what information it contains and what to do with it. Done right, CDA reduces document processing costs, increases efficiency, productivity and data quality, and enhances compliance. When RPA and CDA are combined, they create exponentially better results, automating all document- and electronic data-intensive processes, speeding service and improving the customer experience. One great example of this is the Healthcare Exchange program within CMS. CDA aids in the processing of 40 million+ documents a year, automatically recognizing, classifying and intelligently extracting data; RPA software robots then take the data and deliver it to a case management system of record.
So, how do you get started?
As you evaluate CDA solutions, be sure they offer features and capabilities that will be critical to any successful deployment. These include:
Data extraction: The solution must be able to extract data from any document, in any language, and in any format, such as structured forms, semi-structured documents or unstructured documents. Likewise, a CDA solution must extract all types of information fields, including machine print (in any font), handprint, cursive, barcodes, bubbles and checkboxes. For maximum automation, a solution should be able to leverage multiple OCR engines to derive the optimal results. Merely obtaining raw OCR data is insufficient—a CDA solution must locate, format and interpret that OCR data to make it business-actionable and provide a confidence interval on the extraction.
Integration with other process automation tools: RPA applications and requirements are expanding to include a variety of process automation capabilities, including CDA, business process management (BPM) and dynamic case management (DCM). Process automation is necessary to handle business rules, user forms and exception handling capabilities, at the very least. Any CDA solution should be a part of a broader platform that delivers these robotic and process capabilities to minimize complexities in procurement, licensing, operation and maintenance, and to ensure consistent strategic direction from the vendor offering these components and unifying platforms.
Machine learning of documents and data: Machine learning is used to train the system to understand various documents and to keep the system continuously learning after it becomes operational. This means the system’s document classification and data extraction intelligence is constantly honing and improving itself over time—without the cost of maintaining rules.
Natural language processing (NLP) of unstructured content: NLP drives a better understanding of the content and sentiment of unstructured documents, such as emails, letters and contracts, so humans do not need to intervene. A CDA solution should either include NLP natively or call out to third-party cloud providers like Microsoft and Google via REST services.
Document classification and separation: Federal employees should not spend their time applying barcode stickers and inserting covers and separator sheets. A CDA solution should employ AI-enabled automation to classify and separate documents of any type using multiple methods (i.e., document layout, document content and regular expression-based rules). And it should employ machine learning to continuously improve how it performs these tasks over time.
Data validation/database matching: A CDA solution should be able to check its work and enable federal employees to quickly spot and correct any erroneously extracted characters and fields. Validation rules should be supported at the field level (e.g., Field1 + Field2 = Field3), as should database lookup shortcuts. “Fuzzy” database matching for extraction and validation (for vendor and PO lookups, for example) is also important and should scale to databases of more than a million records.
Process intelligence: A CDA solution should possess process discovery and analytics capabilities to identify automation opportunities and track performance; track document sources, classification and extraction automation rates; and monitor productivity and costs per document and per channel.
Integration with systems of record: A CDA solution must be able to export documents and data to enterprise content management (ECM) and enterprise resource planning (ERP) systems—without needing to write and maintain integration code. Look for pre-built export connectors in the CDA solution that can make such integrations easy, even for unsupported systems and systems lacking exposed APIs. Also consider that RPA is ideal for integrating with hard-to-reach systems that lack exposed APIs.
Distributed and multi-channel capture: Tools and solutions that automate document and data processing must work across distributed locations, such as field and branch offices, as well as across various channels, including mobile, email, web, fax, scanner, folder and MFP front-panel integration. Any CDA solution should be able to support centralized back-office document capture using production scanners, as well as central administration, licensing, reporting and scanner profile management to minimize total cost of ownership. Mobile capabilities should allow developers to integrate a full suite of mobile capabilities including image capture, compression, perfection, classification, recognition, extraction and data validation into their own websites and apps.
Project customization: No two projects are the same. A CDA solution should make it easy to perform common functions while also enabling (via scripting) more unique, application-specific projects. The ability to add script to CDA projects and easily debug scripts is paramount when making the system do exactly what is needed by the business.