Semantic information extraction from images of complex documents

Peanho, Claudio Antonio; Stagni, Henrique; Silva, Flavio Soares Correa da

Artículos de revistas

Fecha

2012-12

Registro en:

APPLIED INTELLIGENCE, DORDRECHT, v. 37, n. 4, supl. 1, Part 1, pp. 543-557, DEC, 2012

0924-669X

http://www.producao.usp.br/handle/BDPI/32527

10.1007/s10489-012-0348-x

http://dx.doi.org/10.1007/s10489-012-0348-x

http://repositorioslatinoamericanos.uchile.cl/handle/2250/1629415

Autor

Peanho, Claudio Antonio

Stagni, Henrique

Silva, Flavio Soares Correa da

Institución

Universidade de São Paulo (Brasil)

Resumen

Even though the digital processing of documents is increasingly widespread in industry, printed documents are still largely in use. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem. In the present article we present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image.

Materias

DOCUMENT IMAGE PROCESSING

INFORMATION EXTRACTION FROM DOCUMENTS

Mostrar el registro completo del ítem