i need a simple matlab code for line,word and character segmentation in a text document.please help.
Posts: 6,843
Threads: 4
Joined: Mar 2015
Abstract
Character Segmentation is the most crucial step for any OCR (Optical Character Recognition) System. The selection of segmentation algorithm being used is the key factor in deciding the accuracy of OCR system. If there is a good segmentation of characters, the recognition accuracy will also be high. Segmentation of words into characters becomes very difficult due to the cursive and unconstrained nature of the handwritten script. This paper proposes a new vertical segmentation algorithm in which the segmentation points are located after thinning the word image to get the stroke width of a single pixel. The knowledge of shape and geometry of English characters is used in the segmentation process to detect ligatures. The proposed segmentation approach is tested on a local benchmark database and high segmentation accuracy is found to be achieved.
Introduction
Segmentation of line, word and character are one of the critical phases of optical character recognition (OCR). Due to the imperfection in segmentation, most of the recognition system produce poor recognition rate. In this paper we are discussing some novel approach for line, word and character segmentation of printed Manipuri document. Few works has been done for optical character recognition on other Indian script however in case of Manipuri language it is almost negligible. To the best of our knowledge this is the first report on segmentation of documents containing Manipuri script forms. So keeping these things in mind here, in this paper we are discussing some approach to succeed in the above mentioned task. Here first we are discussing about the structure of Manipuri language, and then we discuss some idea for segmentation of line, word and character from degraded Manipuri document. Finally we discuss about various existing recognition technique.