Ascent Advanced Forms 3.8 - SR01

Extraction Module

Part Number 10001536-000
Revision A

April 6, 2006

These notes apply to Ascent Advanced Forms 3.8 - SR01 and include the following sections:

Introduction

User Documentation

Installation Note

New Features

Compatibility Issues

Restrictions and Known Problems

Resolved Problems

Problems Resolved in Ascent Advanced Forms 3.7 Service Packs

Kofax Technical Support

Trademark/Copyright Statements

Introduction

Ascent Advanced Forms provides intelligent document recognition. It can be used to determine document types, perform document separation, and extract data from index fields. It can also be set to automatically rotate images and sort the pages of each document. Ascent Advanced Forms Extraction leverages the DOKuStar Extraction technology.

Ascent Advanced Forms can be added to your Ascent Capture workflow like any other custom module. You can use it as a replacement for the standard Ascent Capture Recognition Server module, or use it in conjunction with Ascent Capture Recognition Server module.

For details, refer to the documentation provided with the Ascent Advanced Forms software.

User Documentation

Ascent Advanced Forms Extraction includes the following documentation. (The default location is C:\Program Files\ODT-OCE\Ascent Advanced Forms\Doc).

You can also access the manuals and help files on the Ascent Advanced Forms CD.

Ascent Advanced Forms Extraction Administrator Manual (AdminManual.pdf)

This manual contains essential information on installing and running the Ascent Advanced Forms software. It also provides steps for registering the custom module with Ascent Capture and running the custom module as a service.

Ascent Advanced Forms Extraction User's Manual (UserManual.pdf)

This manual introduces the Ascent Advanced Forms Extraction custom module and provides a quick tour of its interface. It also provides steps for using the custom module in your Ascent Capture workflow.

Ascent Advanced Forms Extraction Design Studio User's Manual (Help.pdf)

This manual introduces the Ascent Advanced Forms Extraction Design Studio and provides a quick tour of its interface. It includes details about extraction, classification, indexing, field types, testing, and more.

Ascent Advanced Forms Extraction Interfaces and File Formats (Interface.pdf)

This manual describes the various programming interfaces that are provided with Ascent Advanced Forms Extraction.

Ascent Advanced Forms Extraction Tutorial (Tutorial.pdf)

This tutorial provides instructions for the Ascent Advanced Forms Design Studio and introduces various examples provided with the software.

Examples

Several compiled examples of projects and Visual Basic programs are installed with the software.

For details about the examples, refer to the readme files and commented source code in the Examples folder. In addition, refer to the Ascent Advanced Forms Tutorial for descriptions of the examples.

Online Help

Several Help systems are available:

Installation Note

Complete installation requirements and instructions can be found in the Ascent Advanced Forms Extraction Administrator Manual.

New Features

The following section lists new features added for Ascent Advanced Forms 3.8 - SR01.

Improved Installation and Upgrade

The installation process no longer installs two separate components for the Extraction software and custom module.

If you are upgrading from Ascent Advanced Forms 3.7, the installation path, program files, and settings will be retained. The upgrade process will remove version 3.7 and then install version 3.8.

Ascent Capture Table and "Multi-value Field" Support

Ascent Advanced Forms supports Ascent Capture Tables. The reserved Ascent Capture "(TABLE)" field type is used for the table, and each column within the table is defined as a table field within the table.

Note that the previous method for defining tables using a single placeholder field for the entire table is no longer supported.

Ascent Capture Internet Server (ACI Server) Support

Ascent Advanced Forms can be used with the Ascent Capture Internet Server (ACI Server) on a remote workstation. All data is now stored within the Ascent Capture database.

Note that for compatibility, the following Advanced Forms files will still be generated:

AC Setup Wizard

The AC Setup Wizard can simplify the steps of setting up the Ascent Capture document classes, form types, and index fields. The wizard will automatically generate Ascent Capture document classes, index fields, and form types based on an existing Ascent Advanced Forms project file.

User Interface for the Ascent Advanced Forms Design Studio

The Ascent Advanced Forms Design Studio's user interface has been redesigned and extended with many new features. For example:

To help you become familiar with the Ascent Advanced Forms Design Studio, you should read Ascent Advanced Forms Design Studio Extraction User's Manual and the Ascent Advanced Forms Extraction Tutorial.

Creating Document Types and Document Classes from Directories

A new Document wizard allows you to create new document types and corresponding new document classes using prepared directories with test documents. The directory names are used as test document names so that you do not need to enter the names. For each document type a corresponding new document class can be created, or an existing document class can be chosen. The TestDirectory property of the test documents is set to the respective directory.

Color Image Processing

Color images can now be processed. A new image processing node Color Processing allows you to control conversion of color images via gray images to binary images.

Field-Specific Image Processing

Different sets of image processing functions for field-specific image processing can be specified in the Image Processing register of the project. For each index field a new property allows you to specify which of the image processing sets should be applied before the field is processed.

New Image Processing Functions

The Cut function allows you to use your mouse to cut a rectangle out of an image. The following functions are used to process the rectangle:

Improved Character Recognition

Due to the integration of RecoStar Professional Plus v3.0, the following improvements have been made to character recognition:

Redesign of Keyword and Phrase Search

The Keyword phrase is no longer available, because the Advanced Forms search strategies for keywords and phrases have been combined in a new Phrase field. You will no longer need to distinguish between looking for a single keyword, or a phrase consisting of several words. The Phrase field can be used for both cases without restriction.

The WordSet field has also been replaced by the PhraseSet field. This allows you to search for sets of keywords or phrases.

Note that older projects containing Keyword or WordSet fields will still run.

New ItemCount Field

The ItemCount field returns the number of results of a subfield. This field will return a state of OK, if the number of results is greater than or equal to a specified threshold. The standard subfield is a Text field. In this case, the number of words read by the Text field is returned. The Text field can be replaced by other field types. The ItemCount field may return text specified for empty or OK results instead of the number or results.

New KeyMultiValue Field

The KeyMultiValue field is similar to the Key Value field with the following important differences:

Extension of the Regular Expression Field

The Regular Expression field displays a new property ExclusionExpression. If a regular expression is entered, then the field will not return the found strings that match the exclusion expression.

Note that the Regular Expression Test dialog box now allows you to manage a list of test strings that can be used to test the regular expression.

New USInvoiceTotals Field

The new USInvoiceTotals field returns the total amount, net amount, tax amount(s), and expenses from American invoices. Note that it returns a similar result format as the InvoiceTotals field.

Extension of the Option Invoice

The option Invoice can now be used for invoices from Australia, Finland, Norway, and Sweden. This option has also been optimized on invoices from Switzerland and Italy.

Extension of the InvoiceVendor Field

The new property ExclusionVendors of the InvoiceVendor field prevents the field from finding undesired results in the invoice vendor data base (for example, an entry of the recipient). This field has been optimized. Substitutions have been reduced by taking the VAT-IDs of all European countries into account. This field also supports VAT-ID numbers with 9 and 10 digits.

This InvoiceVendor field now permits the use of company codes that are contained in the invoice vendor database file.

A new property OutputMultipleVendorRecords can be used. If this property is set to true, then the field returns all database records that have the same value in the first database column as the main field result. The additional records are returned as alternatives with a Confidence value of 0.

The InvoiceVendor field supports the following new database columns:

A new value of the property DbLoad allows you to switch to a new database file dynamically using an attribute file.

Extension of the US Invoice Items Fields

The fields InvoiceItems, USInvoiceItems, InvoiceItemsCustom, and USInvoiceItemsCustom will now return an additional column named ItemDescription. Note that this column sometimes cannot be separated from adjacent columns due to the variety of invoice table formats. Therefore, the column may contain text from adjacent invoice columns.

Extension of the InvoiceItemsCustom and USInvoiceItemsCustom Fields

The fields InvoiceItemsCustom and USInvoiceItemsCustom have been extended. It is now possible to add new user-defined columns and modify existing columns.

In addition, both fields show a SearchOrder node that allows you to control column search.

Extensions of the Amount Field

The Amount field shows a new property InputFormat. The format ".23" searches the amount values starting with a dot or comma with no leading zeros. The format "1.23" searches for all standard amount values that were originally supported by the Amount field. The format "1:23" searches for amounts with the colon as the decimal separator. Note that the formats can be combined. The InvoiceTotals field uses this new feature, which supports amounts with a leading dot or comma.

The MinimalPrecision and MaximalPrecision define a range for the number of decimal places for valid amount values. You can search for amounts with 4 to 8 decimal places.

Extensions of the Address Field

The Address field has been optimized concerning additional lines. The new property IgnoreUmlauts allows you to match names with different spelling of umlauts.

The Address field now supports the DbOutputColumns property. This property specifies which database columns should be contained in the result.

Extension of the Phrase Field

In previous versions, the Phrase field could only be found within a single line. The new property Scope allows you to include phrases crossing table cells or search phrases arranged in adjacent lines as they may appear in a table column or table header.

Extension of the Table Field

The FontName property of the OcrParam node of Table fields can now be set to hand print. This allows the classifier to read tables containing hand printed characters.

You can now specify for table header rows a phrase list and a phrase file instead of a keyword list. The field will identify header phrases even if they consist of several lines.

The new subnode LineConditions allows you to specify which table cells must be present in a valid table row.

Extension of the FuzzyDb Field

If the Phrase field is used as a subfield of the FuzzyDb field, the contents of the database column can be used as the phrase list. If no phrases are specified for the Phrase field, the phrases found in the database column specified by the DbColumnName property will be used.

Extension of the Key Value Field

The new property AddKeyOnlyResults allows you to specify the key field results that should be included in the field results, where no corresponding value could be found.

The KeyValue field can be used to control processing of an index field using an attribute. Set the Source property of the key field to an attribute and set the Zone property of the KeyValue field to "0 0". The key field will then process the current value of the attribute, and the value field will only be processed if the key field is successful (for example, the attribute has a valid value).

Compatibility Issues

The following sections list the compatibility issues for Ascent Advanced Forms 3.8 - SR01.

Parallel Installation of Different Versions is No Longer Supported

Do not install Ascent Advanced Forms 3.8 together with Ascent Advanced Forms 3.x. If you attempt to install Ascent Advanced Forms 3.8 on a workstation that contains the Advanced Forms 3.7 installation, the upgrade will remove the Ascent Advanced Forms 3.7 program and install Ascent Advanced Forms 3.8 program.

Warning You must use the Add or Remove Programs utility from the Windows Control Panel to uninstall Ascent Advanced Forms 3.8.

Specification of Tables Has Been Changed

Tables must be defined according to Ascent Capture 7.0. Use "(TABLE)" as the field type and define every column as a subfield. (SPR 00015361)

Note that the previous method for defining tables with a single placeholder field for the entire table is no longer supported. Therefore, when you upgrade from a previous version, you will need to define your tables according to Ascent Capture.

Automatic Dirt Removal May Cause Problems

Character recognition now contains an automatic dirt removal that can, under certain circumstances, degrade recognition results. If you have a very clean image, some of the dots belonging to the text can be removed, which will deteriorate the recognition quality. It is recommended that if you have a clean image, to set the option "Image is clean" in the Project Settings dialog box.

Result Format of the US Invoice Items Custom Fields Has Been Modified

The values in the ItemDiscount column of the invoice items fields (returning a discount value specified by a percent value) is now formatted the same way as amount values (four decimal places) with a trailing percent sign.

Result Format of the InvoiceVendor Field Has Been Modified

The result file format for the InvoiceVendor field has been changed concerning alternatives with revert reading results. The <revertreadingresults> element that previously followed the <dbrecords> element is now included in the <dbrecords> element as an alternative.

InvoiceVendor Field with the Value Germany

The Country property now only has the value WestEurope. Previous projects created with the value Germany will still run. Projects with the value WestEurope will yield the same results as values with Germany in the previous version.

Currency Property of the Amount Field Has Been Changed

The Currency property now only has a standardized 3-letter code value. Different abbreviations and symbols for the same currency are included. Specifying special currency symbols separately is no longer supported. If you attempt to specify a symbol from a previous project, it will be replaced by the corresponding standard code. Note that the field may return additional amount values with other currency values.

Keyword and WordSet Field is No Longer Available

Ascent Advanced Forms will automatically replace the unavailable Keyword and WordSet fields with Phrase field or PhraseSet fields. Therefore, if you need to convert a project, simply open and save the project with the Ascent Advanced Forms Design Studio.

Note that if word sets are specified in a file using the WordSetFile property of the WordSet field, this file will not change for compatibility reasons. In this case, the WordSet field is converted to a special PhraseSet field that shows an additional WordSetFile property. WordSetFile, PhraseList, and PhraseFile properties cannot be changed. This allows you to use and maintain the single word set file in existing applications.

LineCandidates Node of Table Fields is No Longer Available

The LineCandidates node will no longer display in the Table field. If line candidates are specified in an older project, then the node will still display and the specified conditions will still take effect.

Restrictions and Known Problems

The following section contains information about restrictions and known problems with Ascent Advanced Forms 3.8 - SR01.

Bar Code Field with Regular Expression

A dictionary (expression element <D...>) is not supported in a regular expression of a bar code feature.

Function dscf_rpcCreateServer

If a computer name is specified with the value of parameter host, then the function call may fail. In this case, use the IP address as the host parameter.

Image Processing Function Inverse Text Correction

A fatal error may occur with images containing extraneous noise. This occurs because you are using an older image processing function. To avoid this error message, use the new function RemovalLinesAndInverseText.

Links in the Online Help No Longer Work On Windows 2000

This is a problem with the HTML Help viewer or with Internet Explorer 6 on a Microsoft Windows 2000 operating system with Service Pack 4. To fix this problem, do the following:

  1. Enter the following commands at the DOS prompt:
  2. regsvr32 /u <installation folder>:\winnt\system32\hhctrl.ocx

    regsvr32 <installation folder>:\winnt\system32\hhctrl.ocx

  3. Install Microsoft Critical Update 811630 from the Microsoft Web site. Refer to http://support.microsoft.com/?kbid=811630 for more information.

Problems Accessing Help Files From the Network

Due to vulnerability in HTML Help, access to Help files has been restricted by Windows Security Updates, or Service Packs. As a result, you may no longer be able to view the Help topics. Refer to the Microsoft Web site for more information. To resolve this problem, copy the Help files to a local disk or add the needed entries to your registry, which will allow remote access as described on the Microsoft Web site.

Resolved Problems

The following section contains information about issues that have been resolved with Ascent Advanced Forms 3.8 - SR01.

Problem With BarCode Field

The BarCode field no longer returns an image metrics error.

TabQuantity Column of Table Field Could Contain Erroneous Results

The maximum width for the TabQuantity column has been increased. Prior to this release, unusually long quantity values would simply be truncated.

InvoiceVendor Could Return Erroneous Results With Older Projects

The InvoiceVendor field no longer returns erroneous results with projects created with Ascent Advanced Forms version 3.6 or earlier.

DOKuStarResult.dtd

The DOKuStarResult.dtd file no longer specifies a single OCR for the ItemText column.

Wrong Values For InvoiceVendor Field

The InvoiceVendor field no longer contains erroneous values for the dbcolumns parameter.

Problems Resolved in Ascent Advanced Forms 3.7 Service Packs

The following section contains information about issues resolved with Ascent Advanced Forms 3.7 Service Packs.

InvoiceDate Field Returning Erroneous Result

The InvoiceDate field no longer returns "Maerz" for the invoice month when "Mai" was in the document.

Licensing

Ascent Advanced Forms no longer fails when only the Classify license was available.

Global Table Field

Ascent Advanced Forms no longer fails when the global Table field was linked more than once.

Ascent Advanced Forms Design Studio Did Not Find Attribute File for Test Document of Unknown Type

In the previous release, if the test directory was specified under the TestDocument node of the Unknown document type and an image in the test directory needed a corresponding attribute file, the Ascent Advanced Forms Design Studio could not find the attribute file when processing the image during a Run or a Run Classification command.

InvoiceNumber Field Caused High Runtime and Memory Usage for Certain Documents

In the prior release, the InvoiceNumber field required high runtime and memory usage on documents that contained bar codes.

(US)InvoiceVendor Field Had Problems With Dollar Signs in Database Files

In the previous release, an error would occur when the database file of an InvoiceVendor or USInvoiceVendor field contained the dollar sign ($) character.

InvoiceTotals Result Could Contain Incorrect Currency Value

An incorrect currency value would sometimes appear in the TotalAmount.

Fields Could Not be Added to FirstOf Feature Using Paste

In the previous release, the FirstOf feature did not allow you to add the following features using the paste operation:

Field Name With Spaces Would Fail

In the prior release, if the field name contained spaces, Ascent Advanced Forms would fail. (SPR 00016331)

Field Name Page Would Fail

When Page was used as a field name, Ascent Advanced Forms would fail with the "Invalid Procedure Call" or "Argument" error message.

Problem With RemoveLinesAndInverseText

In the prior release, the image processing function RemoveLinesAndInverseText would not detect areas with inverse text.

Table Field Would Contain Illegal Data

The Table field would include results that only intersected the table search area.

At Table Field Was Not Applied Correctly

In the previous release, if the Table field with HorizontalPositionActive was switched to "on" for a column that was used for the value field of a KeyValue field, the HorizontalPosition property was not applied correctly.

Illegal Regular Expression Would Fail

In regular expressions the "Start of line" character "^" was accepted at illegal positions and would cause Ascent for Advanced Forms to fail at runtime.

Problem With PatchCode Field

In the previous release, if the PatchCode field was used, a bar code error sometimes occurred with documents containing a horizontal black bar.

Problem With the Estonia Language

In the prior release, the Estonia language could not recognize some characters, which was due to the wrong classifier and/or character set.

Problem When Using Two Extraction Stages

If the notation "Field/element" for access to elements of complex fields was used in a configuration with two extraction states, the second stage would yield error messages concerning slashes in an attribute.

RemoveLinesAndInverseText Inverted Complete Image

Images with black border were sometimes completely inverted by the image processing function RemoveLinesAndInverseText.

Confidence Property Not Applied Correctly

The Confidence property value on the Font sub-node of the OcrParams node was not applied correctly. As a result, characters with a lower confidence value would not be rejected. (SPR 00013639)

Changes to Project Files

In the previous release, Ascent Advanced Forms would need to be restarted for changes to project files to take effect. (SPR 00009413)

Running Several Instances as Services

When several instances of Ascent Advanced Forms were run as services, images and attribute files would be replaced with pages from another batch. (SPR 00017899)

Long VAT-ID Numbers for InvoiceVendor Field

In the prior release, the InvoiceVendor field did not support VAT-ID numbers with 10 digits. (SPR 00019091)

Missing Database File

When a database file was missing, Ascent Advanced Forms would display the following error message. (SPR 00011939)

An unexpected internal error occurred. The system's state is undefined!

It is recommended to restart Ascent Advanced Forms Design Studio!

In the previous release, when the database file was not available, the project could not be opened.

Kofax Technical Support

For additional technical information about Kofax products, visit the Kofax Web site at www.kofax.com and select an appropriate option from the Support menu. The Kofax Support pages provide product-specific information, such as current revision levels, the latest drivers and software patches, online documentation and user manuals, updates to product release notes (if any), technical tips, and an extensible searchable knowledgebase.

The Kofax Web site also contains information that describes support options for Kofax products. Please review the site for details about the available support options.

If you need to contact Kofax Technical Support, please have the following information available:

Trademark/Copyright Statements

Copyright

Copyright © 2006 Kofax Image Products, Inc. All rights reserved.

The information contained in this document is the property of Kofax Image Products, Inc. Neither receipt nor possession hereof confers or transfers any right to reproduce or disclose any part of the contents hereof, without the prior written consent of Kofax Image Products, Inc. No patent liability is assumed, however, with respect to the use of the information contained herein.

Trademarks

Kofax, the Kofax logo, Ascent, and Ascent Capture are registered trademarks of Kofax Image Products, Inc.

Ascent Advanced Forms uses DOKuStar Extraction. Copyright Océ Document Technologies GmbH 2002-2006. Océ DOKuStar is a registered trademark of Océ Document Technologies GmbH.

Microsoft and Visual Basic for Applications are trademarks of Microsoft Corporation.

All other product names and logos are trade and service marks of their respective companies.

Disclaimer

The instructions and descriptions contained in this document were accurate at the time they were written. However, succeeding products and documents are subject to change without notice. Therefore, Kofax Image Products, Inc. assumes no liability for damages incurred directly or indirectly from errors, omissions, or discrepancies between the product and this document.

An attempt has been made to state all allowable values where applicable throughout this document. Any values or parameters used beyond those stated might have unpredictable results.