Binarization and Segmentation Framework for Sundanese Ancient Documents

Erick Paulus, Department of Computer Science Universitas Padjadjaran, Indonesia, Indonesia
Mira Suryani, Department of Computer Science Universitas Padjadjaran, Indonesia, Indonesia
Setiawan Hadi, Department of Computer Science Universitas Padjadjaran, Indonesia, Indonesia
Rahmat Sopian, Sundanese Culture Studie, UniversitasPadjadjaran, Indonesia, Indonesia
Akik Hidayat, Department of Computer Science Universitas Padjadjaran, Indonesia, Indonesia

Abstract


Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.

Keywords


binarization, segmentation, ancient document

Full Text:

PDF

References


D. Baldwin, H. M. Walker, and P. B. Henderson, “The roles of mathematics in computer science,” ACM Inroads, vol. 4, no. 4, pp. 74–80, 2013.

P. B. Henderson, “The Role of Mathematics in Computer Science and Software Engineering Education,” Adv. Comput., vol. 65, no. 5, pp. 349–395, 2005.

Z. Hadjadj, M. Cheriet, A. Meziane, and Y. Cherfa, “A new efficient binarization method: application to degraded historical document images,” Signal, Image Video Process., 2017.

B. Gatos, I. Pratikakis, and S. J. Perantonis, “Adaptive degraded document image binarization,” Pattern Recognit., vol. 39, no. 3, pp. 317–327, 2006.

M. W. A. Kesiman, S. Prum, J. C. Burie, and J. M. Ogier, “An initial study on the construction of ground truth binarized images of ancient palm leaf manuscripts,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 656–660.

G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, “Text line detection in handwritten documents,” Pattern Recognit., vol. 41, no. 12, pp. 3758–3772, 2008.

A. Garz, A. Fischer, R. Sablatnig, and H. Bunke, “Binarization-free text line segmentation for historical documents based on interest point clustering,” Proc. - 10th IAPR Int. Work. Doc. Anal. Syst. DAS 2012, pp. 95–99, 2012.

A. Garz, A. Fischer, H. Bunke, and R. Ingold, “A binarization-free clustering approach to segment curved text lines in historical manuscripts,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 1290–1294, 2013.

B. Su, S. Lu, and C. L. Tan, “Robust document image binarization technique for degraded document images,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1408–1417, 2013.

I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2013 document image binarization contest (DIBCO 2013),” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, no. Dibco, pp. 1471–1476, 2013.

N. Ntogas and D. Ventzas, “A Binarization Algorithm For Historical Manuscripts,” in 12th WSEAS International Conference on communications, 2008, pp. 41–51.

K. Khurshid, I. Siddiqi, C. Faure, and N. Vincent, “Comparison of Niblack inspired binarization methods for ancient documents,” SPIE Proc., vol. 7247, p. 72470U–72470U–9, 2009.

J. Sauvola and M. Pietikäinen, “Adaptive document image binarization,” Pattern Recognit., vol. 33, no. 2, pp. 225–236, 2000.

N. R. Howe, “Document Binarization with Automatic Parameter Tuning,” Int. J. Documant Anal. Recognit., vol. 16, no. 3, pp. 247–258, 2013.

L. Likforman-Sulem, A. Zahour, and B. Taconet, “Text line segmentation of historical documents: a survey,” Int. J. Doc. Anal. Recognit., vol. 9, no. 2–4, pp. 123–138, 2006.

R. P. Dos Santos, G. S. Clemente, T. I. Ren, and G. D. C. Calvalcanti, “Text line segmentation based on morphology and histogram projection,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 651–655, 2009.

H. R. Mamatha and K. Srikantamurthy, “Morphological Operations and Projection Profiles based Segmentation of Handwritten Kannada Document,” Int. J. Appl. Inf. Syst., vol. 4, no. 5, pp. 13–19, 2012.

N. Arvanitopoulos and S. Süsstrunk, “Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts,” Int. Conf. Front. Handwrit. Recognit., no. Ic, pp. 726–731, 2014.

C. A. Boiangiu, R. Ioanitescu, and M. C. Tanase, “Handwritten documents text line segmentation based on information energy,” Int. J. Comput. Commun. Control, vol. 9, no. 1, pp. 8–15, 2014.

E. Saund, J. Lin, and P. Sarkar, “PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images,” in 2009 10th International Conference on Document Analysis and Recognition, 2009, pp. 646–650.

C. Clausner, S. Pletschacher, and A. Antonacopoulos, “Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments,” in 2011 International Conference on Document Analysis and Recognition, 2011, pp. 48–52.

W. Liao, N. Dong, and T. Fan, “Application of Scilab in teaching of engineering numerical computations,” in 2009 IEEE International Workshop on Open-source Software for Scientific Computation (OSSC), 2009, pp. 88–90.




DOI: https://doi.org/10.21831/jsd.v6i2.15314

Refbacks

  • There are currently no refbacks.


Copyright (c) 2017 Erick Paulus, Mira Suryani, Setiawan Hadi, Rahmat Sopian, Akik Hidayat

==========================================================================================================
Printed ISSN (p-ISSN): 2085-9872
Online ISSN (e-ISSN): 2443-1273

==========================================================================================================
Indexer :
     ==========================================================================================================
 
 Creative Commons License
 
 
 Free counters!

 

View My Stats

==========================================================================================================