25-09-2010, 03:31 PM
Author: Dr. Andrea Mitofsky
Abstract
Here, students learn how it is possible to identify the language in which a text file is written, a necessary first step in automatic translation of information from one language to another. In the laboratory, the students write Java programs to compute histograms of letter frequencies in text files, and explore how letter frequencies vary in Italian, German, and English. Along the way, they learn about ASCII codes, and about how sampling a longer portion of a text takes longer, but produces a more accurate accounting of letter frequency for that text.
Click here for more details