23-01-2010, 08:16 PM
[attachment=1398]
Abstract
An OCR system constitutes a natural way of providing input to a machinetranslation system. The document is optically scanned. The front end OCR systemextracts text zone and reads the text. The translator, when presented with recognizedsource text, produces a corresponding document with the text content translated in thespeci_ed language.Indian scripts are two dimensional composition of symbols. Therefore, a textreading system, _rst segments the text into symbols. The segmented units are classi-_ed with the help of prototypes for the symbols. The words are composed back fromthe output of the recognition process. Words are veri_ed with the help of a word dictionary.It is at this stage, the interface to the translation system is of great help. Alltranslation systems have a dictionary and a sentence analyzer. A sentence analyzer,in the form of a formal grammar parser, will reject sentences where substitution errorshave occurred except in those cases where the substitution errors lead to othervalid words of the same syntactic category. A sentence analyzer which is expectationdriven, provides clues for alternative words in case of substitution errors.