Homework 1
Due: January 20, 2006
You may work with others on this assignment, but you should turn in separate writeups, and you should understand the solutions. Consult the book and your professor for help if you need it.
This assignment must be done in LaTeX and turned in in Postscript or PDF file format. All good computer science research papers are written using LaTeX. All of your assignments will be in LaTeX, so this first assignment is partially to give you experience with it. Please use good grammar, correct spelling, and complete sentences.
- Reading. You should read chapters 1 and 2 thoroughly.
- Exercises.
- Do exercise 2.1
- The uniform distribution U(a,b) has probability density f(x) =
1/(b-a) if a ≤ x ≤ b, or 0 otherwise.
The expected value of a variable under a distribution is the integral of that variable from -∞ to +∞ times the probability density of the variable. You can find more information about expected values here.
Derive these things: expected value (or mean) of U(a,b), the variance of U(a,b), and the cumulative distribution F(x) = Pr(X ≤ x) of U(a,b). Show your work (please put your final equations into LaTeX form).
You may find these definitions useful:
μ ≡ E[x] = ∫ x Pr(x) dx (the integral is over -∞ to +∞) σ^2 ≡ Var[x] = E[(x - μ)^2]
- Installing and using LaTeX. If you use Linux, you
probably have LaTeX already installed. If not, you can download it from
links you find here or
using your Linux distribution package manager. If you use Windows, you can
download MiKTeX. If you use Mac OSX,
this looks like a good place
to start.
Once you have downloaded and installed LaTeX, use it to produce a DVI version and then a Postscript (or PDF) version of this LaTeX file: homework_01_template.tex. This is the file that you will edit and use to produce this homework. Here is the image file that the template file includes: simple_plot.ps.
LaTeX can be confusing, but there are many helpful command references on the web. For instance, see this reference.
- Using Matlab. You should have access to Matlab
software, if not you may download and use Octave under Linux, it may also be
possible to use Octave under Windows. The Matlab installation that Baylor
has may be limited. One serious limitation for this course is the lack of
the statistical package. Please let me know if you run into these
limitations.
Please orient yourself with Matlab, we will be using it throughout the semester. Matlab is a high-level programming language that can be a very powerful tool for implementing machine learning techniques. Here is a Matlab tutorial I found on the web.
Use Matlab to do the following:
- Generate a sample of 10 numbers from the uniform distribution U(5,10) and find the mean and variance of the sample. Do the same thing for 100 numbers and 1,000 numbers. How do they compare to your earlier analysis?
- Implement simple linear regression and create a plot showing your
results. Generate 30 points along a the X axis between 0 and 10,
and generate corresponding Y values which have a relationship to the X
values, but the relationship is noisy:
X = rand(30, 1) * 10; slope = 3; intercept = 2; noise = randn(30, 1) * 3; Y = X * slope + intercept + noise; plot(X, Y, 'x'); % plot with little x pointsThen use equation 2.6 from your textbook to get an estimate of the slope and intercept. Note that you will have to add a constant dummy variable to x to obtain the intercept. Try the same thing with 60 points and 200 points. How do your estimates change?
Copyright © 2006 Greg Hamerly.
Computer Science Department
Baylor University