ASK HERE

Computer Science Clay · 01-03-2009, 11:01 AM

AI for Speech Recognition

AI is the study of the abilities for computers to perform tasks, which currently are better done by humans. AI has an interdisciplinary field where computer science intersects with philosophy, psychology, engineering and other fields. Humans make decisions based upon experience and intention. The essence of AI in the integration of computer to mimic this learning process is known as Artificial Intelligence Integration
When you dial the telephone number of a big company, you are likely to hear the sonorous voice of a cultured lady who responds to your call with great courtesy saying "welcome to company X. Please give me the extension number you want" .You pronounces the extension number, your name, and the name of the person you want to contact. If the called person accepts the call, the connection is given quickly. This is artificial intelligence where an automatic call-handling system is used without employing any telephone operator.

The Technology
Artificial intelligence (AI) involves two basic ideas. First, it involves studying the thought processes of human beings. Second, it deals with representing those processes via machines (like computers, robots, etc).AI is behaviour of a machine, which, if performed by a human being, would be called intelligence. It makes machines smarter and more useful, and is less expensive than natural intelligence.

Natural language processing (NLP) refers to artificial intelligence methods of communicating with a computer in a natural language like English. The main objective of a NLP program is to understand input and initiate action.The input words are scanned and matched against internally stored known words. Identification of a keyword causes some action to be taken. In this way, one can communicate with the computer in one's language. No special commands or computer language are required. There is no need to enter programs in a special language for creating software.

Voice XML takes speech recognition even further. Instead of talking to your computer, you're essentially talking to a web site, and you're doing this over the phone.OK, you say, well, what exactly is speech recognition? Simply put, it is the process of converting spoken input to text. Speech recognition is thus sometimes referred to as speech-to-text.Speech recognition allows you to provide input to an application with your voice. Just like clicking with your mouse, typing on your keyboard, or pressing a key on the phone keypad provides input to an application; speech recognition allows you to provide input by talking. In the desktop world, you need a microphone to be able to do this. In the Voice XML world, all you need is a telephone.

The speech recognition process is performed by a software component known as the speech recognition engine. The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands. The application can then do one of two things:The application can interpret the result of the recognition as a command. In this case , the application is a command and control application. If an application handles the recognized text simply as text, then it is considered a dictation application.
The user speaks to the computer through a microphone, which in turn, identifies the meaning of the words and sends it to NLP device for further processing. Once recognized, the words can be used in a variety of applications like display, robotics, commands to computers, and dictation.

seminar class · 23-03-2011, 09:38 AM

PRESENTED BY
TAHA MOHAMMED GALIB

[attachment=10786]
ABSTRACT
Artificial Intelligence is the study of the abilities for computers to perform tasks, which currently are better done by humans. Artificial intelligence (AI) involves two basic ideas. Firstly, it involves studying the thought processes of human beings. Secondly, it deals with representing those processes via machines (like computers, robots, etc). AI is the behaviour of a machine, which, if performed by a human being, would be called intelligence. It makes machines smarter and more useful, and is less expensive than natural intelligence.
Speech recognition allows you to provide input to an application with your voice. Just like clicking with your mouse, typing on your keyboard, or pressing a key on the phone keypad provides input to an application; speech recognition allows you to provide input by talking. One of the main benefits of speech recognition system is that it lets user do other works simultaneously. The user can concentrate on observation and manual operations, and still control the machinery by voice input commands.
Speech recognition will revolutionize the way people conduct business over the Web and will, ultimately, differentiate world-class e-businesses. VoiceXML ties speech recognition and telephony together and provides the technology with which businesses can develop and deploy voice-enabled Web solutions TODAY! These solutions can greatly expand the accessibility of Web-based self-service transactions to customers who would otherwise not have access, and, at the same time, leverage a business’ existing Web investments. Speech recognition and VoiceXML clearly represent the next wave of the Web.
1. Introduction:
When you dial the telephone number of a big company, you are likely to hear the sonorous voice of a cultured lady who responds to your call with great courtesy saying “welcome to company X. Please give me the extension number you want” .You pronounce the extension number, your name, and the name of the person you want to contact. If the called person accepts the call, the connection is given quickly. This is artificial intelligence where an automatic call-handling system is used without employing any telephone operator.
AI is the study of the abilities for computers to perform tasks, which currently are better done by humans. AI has an interdisciplinary field where computer science intersects with philosophy, psychology, engineering and other fields. Humans make decisions based upon experience and intention. The essence of AI in the integration of computer to mimic this learning process is known as Artificial Intelligence Integration
2. The Technology:
Artificial intelligence (AI) involves two basic ideas. First, it involves studying the thought processes of human beings. Second, it deals with representing those processes via machines (like computers, robots, etc).AI is behaviour of a machine, which, if performed by a human being, would be called intelligence. It makes machines smarter and more useful, and is less expensive than natural intelligence.
Natural language processing (NLP) refers to artificial intelligence methods of communicating with a computer in a natural language like English. The main objective of a NLP program is to understand input and initiate action.
The input words are scanned and matched against internally stored known words. Identification of a keyword causes some action to be taken. In this way, one can communicate with the computer in one’s language. No special commands or computer language are required. There is no need to enter programs in a special language for creating software.
VoiceXML takes speech recognition even further.Instead of talking to your computer, you're essentially talking to a web site, and you're doing this over the phone.OK, you say, well, what exactly is speech recognition? Simply put, it is the process of converting spoken input to text. Speech recognition is thus sometimes referred to as speech-to-text.
Speech recognition allows you to provide input to an application with your voice. Just like clicking with your mouse, typing on your keyboard, or pressing a key on the phone keypad provides input to an application; speech recognition allows you to provide input by talking. In the desktop world, you need a microphone to be able to do this. In the VoiceXML world, all you need is a telephone.
The speech recognition process is performed by a software component known as the speech recognition engine. The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands. The application can then do one of two things:
The application can interpret the result of the recognition as a command. In this case, the application is a command and control application. If an application handles the recognized text simply as text, then it is considered a dictation application.
3. Speech Recognition:
The user speaks to the computer through a microphone, which in turn, identifies the meaning of the words and sends it to NLP device for further processing. Once recognized, the words can be used in a variety of applications like display, robotics, commands to computers, and dictation.
The word recognizer is a speech recognition system that identifies individual words. Early pioneering systems could recognize only individual alphabets and numbers. Today, majority of word recognition systems are word recognizers and have more than 95% recognition accuracy. Such systems are capable of recognizing a small vocabulary of single words or simple phrases. One must speak the input information in clearly definable single words, with a pause between words, in order to enter data in a computer.
Continuous speech recognizers are far more difficult to build than word recognizers. You speak complete sentences to the computer. The input will be recognized and, then processed by NLP. Such recognizers employ sophisticated, complex techniques to deal with continuous speech, because when one speaks continuously, most of the words slur together and it is difficult for the system to know where one word ends and the other begins. Unlike word recognizers, the information spoken is not recognized instantly by this system.
3.1 What is a speech recognition system?
A speech recognition system is a type of software that allows the user to have their spoken words converted into written text in a computer application such as a word processor or spreadsheet. The computer can also be controlled by the use of spoken commands.
Speech recognition software can be installed on a personal computer of appropriate specification. The user speaks into a microphone (a headphone microphone is usually supplied with the product). The software generally requires an initial training and enrolment process in order to teach the software to recognise the voice of the user. A voice profile is then produced that is unique to that individual. This procedure also helps the user to learn how to ‘speak’ to a computer.
After the training process, the user’s spoken words will produce text; the accuracy of this will improve with further dictation and conscientious use of the correction procedure. With a well-trained system, around 95% of the words spoken could be correctly interpreted. The system can be trained to identify certain words and phrases and examine the user’s standard documents in order to develop an accurate voice file for the individual.
However, there are many other factors that need to be considered in order to achieve a high recognition rate. There is no doubt that the software works and can liberate many learners, but the process can be far more time consuming than first time users may appreciate and the results can often be poor. This can be very demotivating, and many users give up at this stage. Quality support from someone who is able to show the user the most effective ways of using the software is essential.
When using speech recognition software, the user’s expectations and the advertising on the box may well be far higher than what will realistically be achieved. ‘You talk and it types’ can be achieved by some people only after a great deal of perseverance and hard work.
3.2 Terms and Concepts
Following are a few of the basic terms and concepts that are fundamental to speech recognition. It is important to have a good understanding of these concepts when developing VoiceXML applications.
3.2.1 Utterances
When the user says something, this is known as an utterance. An utterance is any stream of speech between two periods of silence. Utterances are sent to the speech engine to be processed. Silence, in speech recognition, is almost as important as what is spoken, because silence delineates the start and end of an utterance. Here's how it works. The speech recognition engine is "listening" for speech input. When the engine detects audio input - in other words, a lack of silence -- the beginning of an utterance is signaled. Similarly, when the engine detects a certain amount of silence following the audio, the end of the utterance occurs.
Utterances are sent to the speech engine to be processed. If the user doesn’t say anything, the engine returns what is known as a silence timeout - an indication that there was no speech detected within the expected timeframe, and the application takes an appropriate action, such as re-prompting the user for input. An utterance can be a single word, or it can contain multiple words (a phrase or a sentence).
3.2.2 Pronunciations
The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text. One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like. Words can have multiple pronunciations associated with them. For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh.” As a VoiceXML application developer, you may want to provide multiple pronunciations for certain words and phrases to allow for variations in the ways your callers may speak them.
3.2.3 Grammars
As a VoiceXML application developer, you must specify the words and phrases that users can say to your application. These words and phrases are defined to the speech recognition engine and are used in the recognition process. You can specify the valid words and phrases in a number of different ways, but in VoiceXML, you do this by specifying a grammar. A grammar uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine. A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.
3.2.4 Accuracy
The performance of a speech recognition system is measurable. Perhaps the most widely used measurement is accuracy. It is typically a quantitative measurement and can be calculated in several ways. Arguably the most important measurement of accuracy is whether the desired end result occurred. This measurement is useful in validating application design Another measurement of recognition accuracy is whether the engine recognized the utterance exactly as spoken.
Another measurement of recognition accuracy is whether the engine recognized the utterance exactly as spoken. This measure of recognition accuracy is expressed as a percentage and represents the number of utterances recognized correctly out of the total number of utterances spoken. It is a useful measurement when validating grammar design.
Recognition accuracy is an important measure for all speech recognition applications. It is tied to grammar design and to the acoustic environment of the user. You need to measure the recognition accuracy for your application, and may want to adjust your application and its grammars based on the results obtained when you test your application with typical users.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	A NOVEL METHOD OF COMPRESSING SPEECH WITH HIGHER BANDWIDTH EFFICIENCY	seminar surveyer	5	2,366	02-04-2015, 04:28 PM Last Post: seminar report asees
	Adaptive Blind Noise Suppression in some Speech Processing Applications	Computer Science Clay	5	5,094	26-07-2013, 02:37 PM Last Post: computer topic
	GESTURE RECOGNITION	seminar projects crazy	4	4,766	19-02-2013, 11:28 AM Last Post: seminar details
	Fingerprint Recognition future directions full report	seminar topics	11	12,523	12-01-2013, 11:49 AM Last Post: seminar details
	Artificial intelligence for speech recognition	computer science crazy	1	2,156	26-11-2012, 02:14 PM Last Post: seminar details
	COMMAND BY SPEECH RECOGNITION	computer girl	1	1,438	27-10-2012, 01:33 PM Last Post: seminar details
	Correlation pattern recognition for biometrics	computer girl	0	958	11-06-2012, 04:37 PM Last Post: computer girl
	A NEW ITERATIVE SPEECH ENHANCEMENT SCHEME BASED ON KALMAN FILTERING	computer girl	0	1,068	05-06-2012, 11:25 AM Last Post: computer girl
	Speech Recognition using DWT	seminar class	1	2,341	01-03-2012, 10:59 AM Last Post: seminar paper
	Seminar Report On Face Recognition Technology	mechanical wiki	8	16,669	29-02-2012, 10:24 PM Last Post: saiannavarapu

Important Note..!

ASK HERE