19-12-2011, 08:36 PM
Aim
The aim of this project is to:
Produce a system, based on a Bayesian text classifier, that will classify emails into two categories: spam and non-spam.
The classifier can be based on that described in detail in "Machine Learning" by Tom Mitchell. It is not necessary to know about Bayesian classifiers before you start.
This project can also be undertaken using a supervised neural net, which could be ready written.
Objectives / major tasks
In order to complete this project fully a student will have to:
Investigate Bayesian classifiers.
Collect a quantity of data on which the system can be trained and tested.
Design and program the Bayesian classifier.
Train the classifier to set the parameters of the system.
Evaluate the system.
Topic areas
(this project will involve study / work in all the topic areas marked with a cross below)
X 1 Intelligent Systems (AI / ANNs etc)
2 Computer Systems & Networks
3 Multimedia
X 4 Software Engineering
5 Databases / Computer Based Information Systems
X 6 General Computer Science
7 Human Computer Interaction
8 Internet
Principal practical skills that will be required
(this project will involve application of all the skills marked with a cross below)
X 1 Program Design / Programming
X 2 Data Analysis / Data Modelling
3 Database Design / Database Construction
4 Specification Writing (formal, semi-formal, structured)
X 5 Program Testing / Formal Verification
6 User Interface Design / Interface Programming
7 Requirements Elicitation / Analysis
X 8 Experimental Design / Results Analysis
9 Simulation / Emulation / Animation
10 Hardware Design
X 11 Other (see notes below)