Plenary Lecture

New Approach for Pre-processing and Efficient Archiving of Scanned Documents

Professor Roumen Kountchev
Faculty of Telecommunications
Technical University of Sofia

Abstract: Problems concerning the efficient archiving of scanned documents are some of the major contemporary challenges in this area. The standards for still image compression JPEG and JPEG 2000 are very efficient when natural images are processed, but they are not as good for texts and graphics: for relatively high compression ratios the quality of the restored images is significantly deteriorated. The efficient compression and correspondingly – archiving of documents’ images requires a more flexible approach adapted to the peculiarities of the processed images. Additional problems arose when compound images, containing texts and pictures have to be processed. The best solution is each part to be processed so that to obtain maximum efficiency.
A new approach for efficient archiving of scanned documents, comprising texts and pictures, is presented in this lecture. The offered approach presumes to compress the pictures and the texts in different way: the pictures - with lossy coding based on decomposition, called Inverse Difference Pyramid (IDP), and the parts, containing text (graphics) – with lossless Adaptive Run-Length (ARL) coding.
The processing comprises the following main steps:
-Image preprocessing, comprising background filtration (aimed at noise removal), histogram analysis and modification;
-Image segmentation – recognition of texts and pictures;
-Image compression – adaptive approach, which permits the pictures to be compressed with some kind of lossy IDP compression and the texts – with lossless ARL coding.
The experimental results obtained for large number of example documents processed with JPEG, JPEG 2000 and the new method prove the advantages of the presented adaptive approach. The same approach is very efficient for archiving of old handwritten documents.
The presented approach is based in investigations and patents developed by the lecturer and his team at the Technical University of Sofia, Bulgaria.

Brief Biography of the Speaker:
Roumen Kountchev, Ph.D., D. Sc. is a professor at the Faculty of Telecommunications at the Technical University of Sofia, Bulgaria and the head of the Image Processing Laboratory.
His main areas of interest are: Digital image processing, Image compression, Multimedia watermarking, Video communications via Internet, Pattern recognition and neural networks. He has 259 papers published in magazines and proceedings of conferences; 12 books and books chapters, 20 patents, and participated in 46 scientific research projects (in 38 projects he was the principal investigator).
He is the President of the Bulgarian Association for Pattern Recognition (BAPR), member of International Association for Pattern Recognition (IAPR), member of editorial board of “International Journal of Reasoning-based Intelligent Systems” (IJRIS), member of the Scientific Expert Commission of Bulgarian Ministry of Education and Science; President of the Technological Council of Bulgarian National Radio, member of the Higher Attestation Commission of the Council of Ministers of Bulgaria.

