Homework 3

Due: March 22, 2006

You may work with others on this assignment, but you should turn in separate writeups, and you should understand the solutions. Consult the book and your professor for help if you need it.

This assignment must be done in LaTeX and turned in printed from Postscript or PDF file format.

Announcements

Tue Feb 28 14:13:29 CST 2006
The assignment has been posted. Please start on it now, and ask questions if something is unclear. This should be a fun assignment!

Assignment

  1. Reading. This assignment deals with chapters 4 and 12.
  2. Problems from the text. Do problems 4.2 (a), (b).
  3. Implement LDA. Implement linear discriminant analysis in Matlab. Your book (chapter 4.3), and the code I handed out in class should be good resources for this.
  4. Simple LDA experiments. Then, using the software, run LDA on this data. The data is of 5 classes, and the class label is in the first column. The remaining two columns are the two input features. Show a plot similar to this that demonstrates where the boundaries are, along with the training data:

    plot of linear discriminant analysis classification on 2-d, 5-class dataset

    I think the easiest way to produce this plot is to classify every point in the input space into one of the five classes, and then plot those points of the same class with the same color.

  5. LDA on vowel classification. Use your LDA software to classify the vowel classification dataset, which can be found on the textbook's website. You should read the info file to become familiar with what the data means (see the section labeled "Application to Vowel Recognition," which is halfway through the document). Note that this is a difficult problem! The best error rates on the test data for these types of classifiers are above 50% (see page 85 in your textbook). How does your LDA classifier compare to the error rates given in the book? Describe your experiments in detail, and analyze the results. Use visual elements like graphs, plots, and confusion matrices to explain your results.
  6. Variance of LDA. Use 10-fold cross-validation on the vowel training set to estimate the variance of the error of LDA. Describe your experiments in detail, and analyze the results.
  7. Support vector machines classification. Download the SVMLight software and use it to perform classification on two classes of the vowel dataset you tried earlier. Try various kernels, kernel parameters and gamma values in order to find the best classifier. How does it compare to LDA for the two classes? How many support vectors are chosen? Show your results for various classifiers.
  8. SVM overfitting. Replicate the experiments in section 12.3.4 on the "Skin of the Orange" dataset (also available from the textbook's dataset webpage). Discuss your findings.

Copyright © 2006 Greg Hamerly.
Computer Science Department
Baylor University

valid html and css