Research Article Volume 3 Issue 1
Department of CSE, Acharya Nagarjuna University, India
Correspondence: Anupama Namburu, Department of CSE, Acharya Nagarjuna University, India
Received: July 08, 2017 | Published: August 23, 2017
Citation: Talari D, Namburu A. Indus image segmentation using watershed and histogram projections. Int Rob Auto J. 2017;3(1):242-245. DOI: 10.15406/iratj.2017.01.00042
Character segmentation is the major step of document image analysis and optical character recognition (OCR). The character segmentation is necessary to detect all the character regions in the image document. The proposed method preprocesses the image document with edge detection techniques to enhance the character edges. Further, the watershed algorithm is implemented to identify the regions of the character. Also, the multiple histogram projections are used to identify the characters. The watershed regions and the multiple histogram projections are compared to analyse the actual character regions improving the accuracy of character recognition. The Proposed method is evaluated on Telugu and Indus images and has extracted the characters accurately.
Keywords: watershed model, multiple projections, edge detection
The recognition process in general involves the segmentation of text lines, words, and the characters. The Segmentation of the handwritten document is still one of the most concerned challenging problems due to complexity in handwritten text. The success of recognition thus depends on the result of character segmentation from text lines and words eliminating the background. In this work the scope is limited to segmenting characters from text lines detected by a segmentation method as character segmentation is a challenging step in the recognition process. Conventional character segmentation methods such as projection profile based methods may not work for degraded historical documents from text lines due to the absence of regular spacing between character components. Therefore, there is a need for developing a new method for segmenting such characters. Anupama et al.1 proposed a method based on projection method for Telugu script document segmentation. The method fails to extract the characters in presence of touching limes. The character segmentation in degraded text lines like indus document images is proposed by Aladhahalli.2 As a conventional technique for text line segmentation, global horizontal projection analysis of black pixels has been utilized.3–6 Partial or piece–wise horizontal projection analysis of black pixels as modified global projection technique is employed by many researchers to segment text pages of different languages.7–9 In this paper to detect the touching characters, first the edge detection techniques Sobel & Prewitts are applied and followed by watershed algorithms to obtain the character region. The multiple projections are applied to these watershed images to obtain the projections of the character regions. The water shed regions and the histogram projections are compared to extract the exact character region. This method eliminating false lines detection of characters in overlapped text lines.
Sobel mask Prewitt mask
In literature available, there are various approaches for character segmentation. However, the Indus documents are shown in Figure 1 has its background distorted and the characters are cursive in nature because of the usage of tools by hands to engrave texts like pictures on hard materials (http://en.wikipedia.org/wiki/Indus_scripts). Therefore, the decipherment of Indus documents in history remains as a research issue in the field of document image analysis. Since there are not many methods on Indus character segmentation in literature, in this section, we review the literature on the segmentation of characters from degraded, historical and handwritten document images. Most of the methods in1,2,7,10–12 are based on projection profiles, whereas12 use component grouping. These methods work well for plain and high–resolution texts with clear spaces between characters and watershed algorithm for Indus images. These methods segment lines using horizontal projection profile, and then use vertical projection profile to segment words or characters. Such a method scans vertically for black pixels. As a result, it may not be suitable for Indus documents. Watershed algorithm for Indus document images are considered in literature due to lack of spacing between the characters. There are methods which explore watershed algorithm for segmenting text lines.12–15 Most of the methods use the results of morphological operations as the input for watershed to segment text lines. It is true that the performance of the morphological operation depends on the size of mask and binary output. Therefore, the methods do not perform well for complex documents such as Indus.
Here, a new technique which automatically identify and segment the text line region of handwritten documents (Figure 2).
Edge detection
Pre–processing aims to produce data that are easy for segmentation accurately. The Indus characters images are often contain degraded background. Hence, the background needs to be eliminated from that of the foreground characters. In order to do so, the Sobel edge detection algorithms are used to extract/highlight the characters from the background (Figure 3a–3c).
Segmentation
Once the character edges are highlighted the water shed algorithm is applied to obtain the regions of the characters. Morphological watersheds provide a complementary approach to the segmentation of objects. It is especially useful for segmenting objects that are touching one another. To understand the watershed transform, an image is considered as a topological surface, where the intensity values of correspond to heights. To extract this observation, inspired by the characteristics of the watershed algorithm, namely, water flow and volume of collection water, a watershed algorithm to detect spaces between character components is proposed. The watershed algorithm finds water flow and high volume of collection of water where there is a space between two character components. These two properties work well even if any touching exists between character components. In this way, watershed algorithm helps in segmenting characters from Indus text lines by finding non–linear spacing between character components (Figure 4) (Figure 5).
, L= Watershed (F), where L is label matrix.
Once the water shed image is obtained the histogram projection are calculated for the watershed image. The procedure to create the histogram projections are indicated in the following steps. As each character in the Indus image after applying watershed can easily be identified, the histogram vertical projections are applied to obtain the regions (Figure 6).
Follow these steps for Word segmentation:
The experimental results of all the segmentation steps for Indus image are shown in (Figure 7a–7g).
In this paper a new method for segmenting characters from text lines and degraded document images like Indus. the proposed algorithm is tested with several document images. Even though this algorithm provides robust results such as detection rate DR (98%) and Recognition Accuracy RA (98%).We have proposed the watershed model for identifying non–linear spacing between characters by exploiting catchment basin and flow of water. Experimental results and the comparisons with the existing methods show that the proposed method outperforms the existing methods in terms of recall and precision. The future work would be extending the same method for blur images and multiple touching character component images in multi scales or multi oriented environments.
None.
Author declares that there is none of the conflicts.
©2017 Talari, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.