20-04-2011, 12:41 PM
[attachment=12466]
Speech Recognition and its clinical applications
Speech recognition ?
Speech Recognition are technologies of particular interest, for their support of direct communication between humans and computers, through a communications mode, humans commonly use among themselves and at which they are highly skilled.
Rudnicky, Hauptman, and Lee
Timeline of Speech recognition
1936 - AT & T’s Bell labs started study of speech recognition (funded by DARPA)
1974 - optical character recognition
1975 – text to speech synthesis ( Kurzweil reading machine)
1978 – speak and spell toy released by Texas Instruments
1980 – Xerox started producing reading machine Text bridge
1997 – Dragon Systems produces first continuous speech recognition product
Types of speech recognition
Isolated words
Connected words
Continuous speech
Spontaneous speech (automatic speech recognition)
Voice verification and identification
Speech recognition – uses and applications
Dictation
Command and control
Telephony
Medical/disabilities
Challenges of speech recognition
Ease of use
Robust performance
Automatic learning of new words and sounds
Grammar for spoken language
Control of synthesized voice quality
Integrated learning for speech recognition and synthesis
SpeechActs
Why develop SpeechActs?
Integrated conversational applications
No specialized language expertise
Technology independence
Information flow in SpeechActs
SpeechActs - Framework
Audio server presents raw digitized audio to speech recognizer
Swiftus parses the word list to produce a set of feature-value pairs
Discourse manager maintains a stack of information about the current conversation
Discourse manager and application respond to the user by sending a text string to ‘text to speech manager’
SpeechActs: A Spoken Language Framework
Continuous-speech recognizers require grammars that specify every possible utterance a user could say to the application
The recognizer grammar should closely synchronize with the Swiftus semantic grammar
Solved by inventing Unified Grammar
Unified grammar
Collection of rules
Made of a pattern such as Backus-Naur Form followed by augmentations which are statement written in the Pascal-like form
Compiler that produces a grammar specific to speech recognizer and corresponding Swiftus grammar
Swiftus – the natural language processor
Semantic representation generated in real time to facilitate conversation
Accurate understanding
Tolerance of misrecognized words
Wide variation among applications
Ease of use
Swiftus performance - Solved
Discourse management
To support more natural speech , we need at least rudimentary discourse management
Should support discourse-segment pushing and popping
Prompt design
Error-correcting mechanism
Discourse manager
discourse represented as a data structure consisting of functions for handling user output
maintains a stack of these structures, and the top one handles the default discourse for the current application or current dialogue
current application or dialogue popped off the stack when the user cancels the activity or the problem is resolved
keeps a simple stack of referenced items to a avoid entering into a subdialogue
To simulate human conversation….
conversational pacing
explicit error corrections
define the functional boundaries of an application
Clinical applications
Medical transcription mainly in radiology and pathology
First use of speech recognition in the field of radiology in 1981
Mean accuracy rate of reading pathology reports, using IBM Via Voice Pro software – 93.6% compared to human transcription at 99.6%
Speech recognition in clinical dentistry?
13% used voice recognition
16% discontinued using voice recognition
21% believed chairside computer use could be improved with better voice recognition
Using an automatic speech recognition will be the way to go!!
Speech Recognition and its clinical applications
Speech recognition ?
Speech Recognition are technologies of particular interest, for their support of direct communication between humans and computers, through a communications mode, humans commonly use among themselves and at which they are highly skilled.
Rudnicky, Hauptman, and Lee
Timeline of Speech recognition
1936 - AT & T’s Bell labs started study of speech recognition (funded by DARPA)
1974 - optical character recognition
1975 – text to speech synthesis ( Kurzweil reading machine)
1978 – speak and spell toy released by Texas Instruments
1980 – Xerox started producing reading machine Text bridge
1997 – Dragon Systems produces first continuous speech recognition product
Types of speech recognition
Isolated words
Connected words
Continuous speech
Spontaneous speech (automatic speech recognition)
Voice verification and identification
Speech recognition – uses and applications
Dictation
Command and control
Telephony
Medical/disabilities
Challenges of speech recognition
Ease of use
Robust performance
Automatic learning of new words and sounds
Grammar for spoken language
Control of synthesized voice quality
Integrated learning for speech recognition and synthesis
SpeechActs
Why develop SpeechActs?
Integrated conversational applications
No specialized language expertise
Technology independence
Information flow in SpeechActs
SpeechActs - Framework
Audio server presents raw digitized audio to speech recognizer
Swiftus parses the word list to produce a set of feature-value pairs
Discourse manager maintains a stack of information about the current conversation
Discourse manager and application respond to the user by sending a text string to ‘text to speech manager’
SpeechActs: A Spoken Language Framework
Continuous-speech recognizers require grammars that specify every possible utterance a user could say to the application
The recognizer grammar should closely synchronize with the Swiftus semantic grammar
Solved by inventing Unified Grammar
Unified grammar
Collection of rules
Made of a pattern such as Backus-Naur Form followed by augmentations which are statement written in the Pascal-like form
Compiler that produces a grammar specific to speech recognizer and corresponding Swiftus grammar
Swiftus – the natural language processor
Semantic representation generated in real time to facilitate conversation
Accurate understanding
Tolerance of misrecognized words
Wide variation among applications
Ease of use
Swiftus performance - Solved
Discourse management
To support more natural speech , we need at least rudimentary discourse management
Should support discourse-segment pushing and popping
Prompt design
Error-correcting mechanism
Discourse manager
discourse represented as a data structure consisting of functions for handling user output
maintains a stack of these structures, and the top one handles the default discourse for the current application or current dialogue
current application or dialogue popped off the stack when the user cancels the activity or the problem is resolved
keeps a simple stack of referenced items to a avoid entering into a subdialogue
To simulate human conversation….
conversational pacing
explicit error corrections
define the functional boundaries of an application
Clinical applications
Medical transcription mainly in radiology and pathology
First use of speech recognition in the field of radiology in 1981
Mean accuracy rate of reading pathology reports, using IBM Via Voice Pro software – 93.6% compared to human transcription at 99.6%
Speech recognition in clinical dentistry?
13% used voice recognition
16% discontinued using voice recognition
21% believed chairside computer use could be improved with better voice recognition
Using an automatic speech recognition will be the way to go!!