OmniPage Forms Template Editor (FTE) is a powerful tool that allows anyone to quickly become a forms processing expert. FTE includes a UI for defining templates as well as API calls that makes extraction of data from documents such as Invoices, Mortgage Loan documents, and applications easy.
A key first step in many document extraction workflows is classifying a document so it can be routed properly. The OmniPage Document Classification (DC) engine utilizes AI, computer vision and machine learning to automatically sort documents into the correct category based on a training set. The Document Classification module includes a UI for designing classification models as well as API calls to programmatically classify pages. Assigning a document the proper taxonomy, allows for better decision making at a cost-competitive price.
The OmniPage toolkits include the ability to compress PDFs using MRC (Mixed Raster Content) compression technology. Traditional compression methods apply one value to the whole file content, which may be too high for some elements, too low for others and optimal for neither. MRC separates the text elements from the pictures or backgrounds and applies optimum compression to each. This can yield PDF files up to ten times smaller than traditional methods, with less compromising of file quality.
The first step in achieving the highest quality OCR accuracy is providing the OCR engine with the best quality image. Kofax is well regarded as the leader in image preprocessing and OmniPage provides the tools to enhance required to achieve your goals. As organizations are receiving documents from a variety of scanners, MFP’s, phones or tablets at varying dpi’s, the right preprocessing technology is essential to achieve the best OCR results. The OmniPage toolkits include algorithms for the following:
3D Deskewing (correcting distortion found in photographs)
Adaptive Noise Removal (advanced despeckling)
Adaptive Binarization (converting to black and white)
Book Page Handling
Screen Capture Processing
Screenshots are notoriously difficult to extract data from. They are typically low resolution (72-96 DPI) and often pixels that look black to human eyes are actually blue or orange in the image itself. In version 21 of the OmniPage Capture SDK, we have introduced a new screen capture processing mode which will automatically detect screen capture images and turn on algorithms that will enhance the resolution of the input images and binarize (convert to black and white) the image in an optimal manner. The end result is better accuracy when recognizing screen captures. If you are building a Data Loss Prevention (DLP) or Robotics Process Automation (RPA) solution, then OCRing screen captures is likely a critical component in your workflow. The new Screen Capture Processing mode will ensure that you are getting the best accuracy possible.
The OmniPage Capture SDK v21 is available in the following programing language/OS configurations.
Windows – C/C++, .NET Framework, .NET Core, Java
Linux – C/C++, .NET Core, Java (coming soon)
MacOS – C/C++
The OmniPage Capture SDK is capable of recognizing documents in over 120 different languages. These include:
Virtually all Latin script languages
Cyrillic script languages
Asian languages such as Japanese, Chinese (simplified/traditional), Korean, Thai, Vietnamese
Middle Eastern Languages such as Arabic and Hebrew
The toolkit also includes automatic language detection which allows you to deal with batches of documents that contain mixed languages.