ICFHR 2010 Contest
Quantitative evaluation of binarization algorithms of images of historical documents with bleeding noise
The evaluation and comparison of binarization algorithms proved to be difficult task since there is no objective way to compare the results. Several review papers have tried to compare binarization algorithms by using the precision and recall analysis of the resultant words in the foreground or by evaluating their effect on end-to-end character or word recognition performance in a complete archive document recognition system utilizing OCR. Every work, that performed comparison, presented some very interesting conclusions. However, the problem is that in all cases, they try to use results from ensuing tasks in document processing hierarchy, in order to survey the algorithm performance. Although in many case this is the objective goal, it is not always possible. Such is the case of historical documents, where their quality, in many cases obstructs the recognition, and sometimes even the word segmentation, this way of evaluation can be proved problematic.
On the other hand, we need different evaluation technique, since the processing of historical documents is one of the hardest cases and binarization can be required for removing the noise and facilitate their appropriate presentation. The ideal way of evaluation should be able to decide, for each pixel, if it finally has succeeded the right color (black or white) after the binarization. This is an easy task for a human observer but very difficult to do it automatically for all the pixels of several images.
The proposed method includes the experimentation on document archives made by constructing noisy images, using techniques of image mosaicing, and combining old blank historical document pages with noise-free pdf documents. After the application of the binarization algorithms to the synthetic images, it is easy to evaluate the results by comparing the resulted image with the original document as ground truth image. This way, we are able to count the exact amount of the remaining wrong pixels either on background or on foreground.
As we already mentioned, our intention is to be able to check for every pixel if it is right or wrong. Thus, pixel error will be used, that is the total amount of pixels of the image that in the output image have wrong color. The number of black pixels classified correctly as black pixel is denoted by "b" and the number of white pixels classified correctly as white is denoted as "w". The total number of black and white pixels is B and W respectively. The document binarization can be considered an unbalanced problem where the number of black pixels is too much low than the number of white pixels. In this case a better measure could be the geometric-mean. The geometric-mean pixel acuracy is: sqrt((b/B)*(w/W))
The evaluation of the binarization methods will be made on synthetic images. That is, starting from a clean document image (doc), which is considered as the ground truth image, noise of different types is added (noisy images). This way, during the evaluation, it is able to decide, objectively, for every single pixel if its value is correct comparing it with the corresponding pixel in the original image.
For training purposes 60 document images obtained by using image mosaicing techniques are available. The doc set consists of three document images in pdf format, including tables, graphics, columns and many of the elements that can be found in a document. The noisy set consists of ten old blank images, taken from a digitized document archive of the 18th century. These include most kinds of problems that can be met in old documents: presence of stains and strains, background of big variations and uneven illumination, ink seepage etc. The docs are used as target images and all the noisy images are resized to A4 size. Then, two different techniques for the blending are used: the maximum intensity and the image averaging approaches. For the test set different groundtruth documents will be considered and a different number of images will be generated.
Registration open: mail to email@example.com
April 6th: Training data available
3 groundtruth images
60 degraded images: 20 images (10 avg + 10 max) for each groundtruth
May 1st: Test data available:
7 groundtruth images
210 degraded images: 30 images (15 avg + 15 max ) for each groundtruth
How to cheat but please do not: Since we have provided several degraded versions for each test groundtruth image you could estimate the pixel level (foreground or background) considering the information in all these degraded images. Thus, in order to be fair you have to process each degraded image as an unique available example. So please do not cheat!
The methods have to be fully automatic without user coperation.
June 1st: Test results submission.
Very soon we will open an uploading system
You have to submit your images using the following rules:
Use a unique file (.zip)
The images have to be placed in directories following the same structure that we provide for the test data
Each resulting image has to have the same name than the original degraded image.
The image format has to be monochrome BMP.
Ergina Kavallieratou - firstname.lastname@example.org - University of the Aegean
Rafael Dueire Lins - email@example.com - Universidade Federal de Pernambuco
Roberto Paredes - firstname.lastname@example.org PRHLT - Universidad Politécnica de Valencia