29-06-2010, 07:55 PM
Abstract
India is a multi-lingual, multi-script country. Considerably less work has been done towards handwritten character recognition of Indian languages than for other languages. This dissertation describes an optical character recognition system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance. In this dissertation, a neural network based invariant character recognition system is proposed. The proposed model consists of two parts. The first is a preprocessor which is intended to produce a translation, rotation and scale invariant representation of the input pattern. The preprocessed output is then classified by a neural net classifier trained by a relatively new learning algorithm called back propagation. The recognition system was tested with ten numeric digits (0-9). The test included rotated, scaled and translated version of exemplar patterns. This simple recognizer with backpropagation classifier could successfully recognize nearly 97% of the test patterns.
read full report
http://eprints.iisc.ernet7573/1/a_font.pdf