Model.ShowBackToRefferer = false

OmniPage  Developer  Portal

Features & Technology

Forms Processing

OmniPage Forms Template Editor (FTE) is a powerful tool that allows anyone to quickly become a forms processing expert. FTE includes a UI for defining templates as well as API calls that makes extraction of data from documents such as Invoices, Mortgage Loan documents, and applications easy.

Document Classification

A key first step in many document extraction workflows is classifying a document so it can be routed properly. The OmniPage Document Classification (DC) engine utilizes AI, computer vision and machine learning to automatically sort documents into the correct category based on a training set. The Document Classification module includes a UI for designing classification models as well as API calls to programmatically classify pages. Assigning a document the proper taxonomy, allows for better decision making at a cost-competitive price.

PDF Super-Compression

The OmniPage toolkits include the ability to compress PDFs using MRC (Mixed Raster Content) compression technology. Traditional compression methods apply one value to the whole file content, which may be too high for some elements, too low for others and optimal for neither. MRC separates the text elements from the pictures or backgrounds and applies optimum compression to each. This can yield PDF files up to ten times smaller than traditional methods, with less compromising of file quality.

Image Pre-processing

The first step in achieving the highest quality OCR accuracy is providing the OCR engine with the best quality image. Kofax is well regarded as the leader in image preprocessing and OmniPage provides the tools to enhance required to achieve your goals. As organizations are receiving documents from a variety of scanners, MFP’s, phones or tablets at varying dpi’s, the right preprocessing technology is essential to achieve the best OCR results. The OmniPage toolkits include algorithms for the following:

  • Auto Rotation
  • Auto Deskewing
  • 3D Deskewing (correcting distortion found in photographs)
  • Adaptive Noise Removal (advanced despeckling)
  • Resolution Enhancement
  • Adaptive Binarization (converting to black and white)
  • Punch-hole Removal
  • Auto Cropping
  • Book Page Handling
  • Image Erosion/Dilation
  • Fax Correction

Screen Capture Processing

Screenshots are notoriously difficult to extract data from. They are typically low resolution (72-96 DPI) and often pixels that look black to human eyes are actually blue or orange in the image itself. In version 21 of the OmniPage Capture SDK, we have introduced a new screen capture processing mode which will automatically detect screen capture images and turn on algorithms that will enhance the resolution of the input images and binarize (convert to black and white) the image in an optimal manner. The end result is better accuracy when recognizing screen captures. If you are building a Data Loss Prevention (DLP) or Robotics Process Automation (RPA) solution, then OCRing screen captures is likely a critical component in your workflow. The new Screen Capture Processing mode will ensure that you are getting the best accuracy possible.

Programming API

The OmniPage Capture SDK v21 is available in the following programing language/OS configurations.

  • Windows – C/C++, .NET Framework, .NET Core, Java
  • Linux – C/C++, .NET Core, Java (coming soon)
  • MacOS – C/C++

Language Support

The OmniPage Capture SDK is capable of recognizing documents in over 120 different languages. These include:

  • Virtually all Latin script languages
  • Cyrillic script languages
  • Asian languages such as Japanese, Chinese (simplified/traditional), Korean, Thai, Vietnamese
  • Middle Eastern Languages such as Arabic and Hebrew

The toolkit also includes automatic language detection which allows you to deal with batches of documents that contain mixed languages.