12-10-2010, 01:40 PM
[attachment=5708]
Natural Language Processing
NATURAL LANGUAGE PROCESSING (NLP) is one of the upcoming applications of AI. The goal of the Natural Language Processing (NLP) is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.
This goal is not easy to reach. "Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to page link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.
The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like "Flying planes can be dangerous". Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should "can" be analyzed as a verb or as a noun? Which of the many possible meanings of "plane" is relevant? Depending on context, "plane" could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?
We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.
Extensive research in NLP over the past decade has brought us one of the most useful applications of AI: machine translation. If we could one day created a program that could translate (for example) English text to Japanese and vice versa without need of polishing by a professional translator then bridges of communication could be significantly widened. Our current translation programs have not yet reached this level, but they may do so very soon. In particular, NLP research also deals with speech recognition. Currently, programs that convert spoken speech into text have been widely used and are fairly dependable.
Recent research in Machine Translation (MT) has focused on “data-driven” systems. Such systems are “self-customizing” in the sense that they can learn the translations of terminology and even stylistic phrasing from already translated materials. Microsoft Research’s MT (MSR-MT) system is such a data-driven system, and it has been customized to translate Microsoft technical materials through the automatic processing of hundreds of thousands of sentences from Microsoft product documentation and support articles, together with their corresponding translations. This customization processing can be completed in a single night, and yields an MT system that is capable of producing output on par with systems that have required months of costly human customization.
Natural Language Processing
NATURAL LANGUAGE PROCESSING (NLP) is one of the upcoming applications of AI. The goal of the Natural Language Processing (NLP) is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.
This goal is not easy to reach. "Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to page link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.
The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like "Flying planes can be dangerous". Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should "can" be analyzed as a verb or as a noun? Which of the many possible meanings of "plane" is relevant? Depending on context, "plane" could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?
We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.
Extensive research in NLP over the past decade has brought us one of the most useful applications of AI: machine translation. If we could one day created a program that could translate (for example) English text to Japanese and vice versa without need of polishing by a professional translator then bridges of communication could be significantly widened. Our current translation programs have not yet reached this level, but they may do so very soon. In particular, NLP research also deals with speech recognition. Currently, programs that convert spoken speech into text have been widely used and are fairly dependable.
Recent research in Machine Translation (MT) has focused on “data-driven” systems. Such systems are “self-customizing” in the sense that they can learn the translations of terminology and even stylistic phrasing from already translated materials. Microsoft Research’s MT (MSR-MT) system is such a data-driven system, and it has been customized to translate Microsoft technical materials through the automatic processing of hundreds of thousands of sentences from Microsoft product documentation and support articles, together with their corresponding translations. This customization processing can be completed in a single night, and yields an MT system that is capable of producing output on par with systems that have required months of costly human customization.