Project Silpa Python Based Indian Language Processing Framework
#1

Project Silpa
Python Based Indian Language Processing Framework

Prepared by
Anish A & Arun Anson
S6 CSE, Mohandas College of Engineering and technology,
Anad, Thiruvananthapuram

[attachment=10154]

Abstract
Silpa, Swathanthra Indian Language Processing Applications is a web platform to host
the free(as in freedom) software language processing applications easily. It is a web
framework and a set of applications for processing Indian Languages in many ways. Or in
other words, it is a platform for porting existing and upcoming language processing
applications to the web. Silpa can also be used as a python library or as a web service from
other applications

Introduction
Silpa is the abreviation of Swathanthra Indian
Language Processing Applications. Silpa can be used
as
• A web framework for hosting Indian Language
Processing applications.
• JSON[1]
-RPC[2]
based web service for using the
Silpa services for other applications
• A Python Library for Indian Language
Processing .
Silpa project is released under GNU Affero General
Public License Version 3[3][4]
. It is a free(as in freedom)
software. Its lead developer is Santhosh Thottingal. I
am also a developer

Architecture
The silpa architecture consists of following
components.The components are explained below.
• Common Utils : Used for common
applications in silpa.
• Settings : Handles configuration files and
settings for silpa.
• Templating : Handles the templates, style
sheets, etc for silpa.
• URL Mapping : Maps the URLs to modules
according to settings.
• JSON Encoder/Decoder : It interprets JSON
data for processing in silpa and converts
internal representation to JSON for
communicating with applications.
• RPC handler : Handles remote Procedure
Calls.
• Action Dispatcher : Handles action taken by
modules.
• WSGI
[5]
handler : It acts as an interface
between silpa and web server.

Modules
Silpa contains many modules. Some of which are
stable and some in experimental state. Modules extend
the functionalities of silpa. Or in other terms, modules
make silpa usable.
Sort
Unicode Collation Algorithm(UCA)
[6]
based sorting for
all languages defined in Unicode. The collation weights
used in this application is a modified version of Default
Unicode Collation Element Table (DUCET).The current
version is modified only for Malayalam language. For
other languages,it use the default weights defined by
Unicode. Malayalam sorting is compatible with GNU C
library collation definition.

Indic Soundex
Soundex is a phonetic indexing algorithm. It is used to
search/retrieve words having similar pronunciation but
slightly different spelling. Soundex was developed by
Robert C. Russell and Margaret K. Odell. The soundex
code for a word is an english alphabet followed by a
number of digits.
By this algorithm, if a name is written as Santhosh ,
Santosh , Santhos or Santos , the soundex code
remains same and it is a5B20000.
Original Soundex algorithm is not multilingual. Our
algorithm will not be exact copy of English soundex,
but we will use the concept.

Algorithm
1. For each letter in the word except first letter,
get the corresponding soundex digit from the
character map, which is nothing but a table.
2. If the letter is not found in character map, the
soundex digit for that letter is 0.
3. Duplicate consecutive soundex codes are
skipped. ie, effectively क will be considered as
क.
4. Replace first digit with first alpha character.
5. Remove all 0s from the soundex code.
6. Return soundex code padded to the required
length (ie, if required length of code is 5 and
soundex is സBCD, then soundex returned will
be സ BCD0.
Example
കാരതിക്, കാരതിക്, കാരതിഗ് = ക APKBF00
கார்திக்= க APKBF00

Spell checker
Spell check[7]
provides language independent spell
checking service across Indian languages and English.
The spell check service checks the dictionary for
spelling. If found, it will be displayed as correct. If not
found, it will fetch similar words from the dictionary.
A spell checker customarily consists of two parts[7]
:
1. A set of routines for scanning text and
extracting words, and
2. An algorithm for comparing the extracted
words against a known list of correctly spelled
words (i.e., the dictionary).
Eg : കാക
Wrong Spelling. Suggestions :
• കാ
• കാകളി
• കാകാ
• കാക

Dictionary
Dictionary module provides dictionary service on indic
languages. It also looks in wiktionary for meanings.
Eg :
Definitions from SILPA Dictionary
human
മനഷയസംബനമായ
മനഷയഗണങളള
മാനഷികമായ
Definitions from Wiktionary
മാനഷികമലയങോളാടകടിയ
Text Similarity
This module will compare two texts for their similarity.
Based on the similarity it will give a number between 0
and 1. 1 means both text are similar. 0 means texts are
completely different. A value in between 0 and 1
indicates how much they are similar.
The algorithm uses an n-grams[8]
model and cosine
similarity[9]
.
Transliteration
Transliteration is the practice of converting a text from
one writing system into another in a systematic way[10]
.
This application helps you to transliterate text from any
Indian language to any other Indian language.
Language of each word will be detected. You can give
the text in any language and even with mixed
language. Transliteration to International Phonetic
Alphabet
[11]
is also available.

Syllabification
A syllable[12]
is a unit of organization for a sequence of
speech sounds. Syllabification is the separation of a
word into syllables, whether spoken or written. In most
languages, the actually spoken syllables are the basis
of syllabification in writing too.
Eg : അനീഷ്
അ, നീ, ഷ

References
1. http://en.wikipediawiki/JSON
2. http://en.wikipediawiki/Remote_proced
ure_call
3. http://gnulicenses/agpl.html
4. http://en.wikipediawiki/Affero_General
_Public_License
5. http://en.wikipediawiki/Web_Server_G
ateway_Interface
6. http://en.wikipediawiki/Unicode_collati
on_algorithm
7. http://en.wikipediawiki/Spell_checker
8. http://en.wikipediawiki/N-gram
9. http://en.wikipediawiki/Cosine_similarit
y
10. http://en.wikipediawiki/Transliteration
11. http://en.wikipediawiki/IPA
12. http://en.wikipediawiki/Syllable
13. http://en.wikipediawiki/Katapayadi_sa
nkhya
14. http://ml.wikipediawiki/Paralperu
15. http://en.wikipediawiki/Stemming
16. http://en.wikipediawiki/Fuzzy_string_s
earching
17. http://en.wikipediawiki/Levenshtein_di
stance
18. http://en.wikipediawiki/Fortune_(Unix)
19. http://en.wikipediawiki/Speech_synthe
sis
20. http://dhvani.sourceforge
21. http://en.wikipediawiki/Hyphenation_al
gorithm
22. http://thottingalblog/2008/12/16/hyphen
ation-of-indian-languages-in-webpages/
23. http://ftp.twarenUnix/NonGNU/smc/hy
phenation/web/example.html
24. http://lists.nongnumailman/listinfo/silpa
-discuss
25. https://savannah.nongnutask/?
group=silpa

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
To view a copy of this license, visit http://creativecommonslicenses/by-sa/3.0
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: python get list of dictionary, how to use dictionary in python, krishi bikash silpa kendra news in anandabazar patrika 2015, seminar topics in python ppt, c language project list, python global name, project for computer science python,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Image Processing & Compression Techniques (Download Full Seminar Report) Computer Science Clay 42 22,971 07-10-2014, 07:57 PM
Last Post: seminar report asees
  On-line Analytical Processing (OLAP) computer science crazy 2 2,623 01-04-2014, 11:11 PM
Last Post: seminar report asees
  Hardware for image processing - Basics Eye – Human vision sensor ppt computer topic 0 7,763 25-03-2014, 11:12 PM
Last Post: computer topic
Question Space-time Adaptive Processing (STAP) computer science crazy 2 3,153 16-10-2013, 03:09 PM
Last Post: Guest
  Digital Light Processing computer science crazy 1 2,269 11-01-2013, 10:56 AM
Last Post: seminar details
  Dynamic programming language seminar projects crazy 2 3,188 03-01-2013, 12:31 PM
Last Post: seminar details
  digital image processing project topics 1 2,282 19-11-2012, 01:46 PM
Last Post: seminar details
  GPUs - Graphics Processing Units computer girl 0 1,014 07-06-2012, 03:45 PM
Last Post: computer girl
Music D Programming Language Computer Science Clay 2 2,565 14-03-2012, 02:35 PM
Last Post: seminar paper
  Parallel Computing In Remote Sensing Data Processing computer science crazy 4 4,848 01-03-2012, 09:32 AM
Last Post: seminar paper

Forum Jump: