Dimensionality reduction

This Assignment involves translating several methods from pseudocode to real code. As you do it, please be sure to pay some attention to what you are instructing the computer do.

  1. Implement a function that computes the first principal component of some data. Here is pseudocode:
    Let X be a matrix, where each row is a point-vector.
    Let c be the number of columns in X.
    Let p be a random vector of size c drawn from a spherical distribution.
    Let m = 0.
    for i from 0 to 199, do:
    	Let t be a vector of size c.
    	Set each element of t to 0. // See Vec.fill
    	for each row x in X, do:
    		t = t + (x * p)x    // * = inner product
    	d = ||t||                   // d = magnitude of t
    	p = t / d                   // See Vec.normalize
    	if i < 6 or d - m > .0001   // if significant progress was made
    		m = d
    	else
    		break
    return p
    


  2. Implement a function that uses the Gram-Schmidt process to remove a component from some data. Here is pseudocode:
    Let X be a matrix, where each row is a point-vector.
    Let r be the number of rows in X.
    Let v be the component vector that you wish to remove from X.
    (v should be a unit vector. That is, it should be normalized.)
    for each point-vector (row) x in X, do:
    	Let d = x * v       // * = inner product
    	x = x - dv
    


  3. Implement Principal Component Analysis. Here is some pseudocode:
    Let X be a set of points in high-dimensional space that we want to reduce.
    Let n be the number of points (rows) in X.
    Let d be the current number of dimensions (columns) in X.
    Let t be the target number of dimensions into which we want to reduce X.
    
    // Training phase
    Let c be the centroid of X.
    Subtract c from each point in X.
    Let M be a matrix with 0 rows and d columns.
    for i from 0 to t - 1, do:
    	Compute p as the first principal component in X.
    	Use the Gram-Schmidt process to remove the p component from X.
    	Add p as a new row to M.
    
    // Reducing phase
    Let X be the original set of points that it was before the training phase.
    Let Y be a matrix with 0 rows and t columns.
    for each row x in X, do:
    	y = M * (x - c)     // * = Matrix-vector multiplication
    	Add y as a row to Y.
    return Y
    


  4. Implement an iterative form of multi-dimensional scaling (MDS). Here is some pseudocode:
    Let X be a set of points we wish to adjust.
    Let d(a,b) be the target distance between points a and b.
    Let c(a,b) be the current distance between a and b.
    Do until convergence is detected:
    	Pick a random point, a from X.
    	Pick a random point, b from X.
    	Adjust a and b minimally and equally
    	to make c(a,b)=d(a,b).
    
    How does one adjust a and b? Maybe this picture will help:
    If you write a function to adjust a and b, I recommend returning a value that indicates how far they moved. This value could be useful for detecting convergence. (For example, you might detect convergence if the total amount of movement over 1000 iterations is less than 0.01.)


  5. Implement an incremental version of Isomap that builds on your iterative MDS algorithm. Here is some pseudocode:
    Let X be a set of points in high-dimensional space that we want to reduce.
    Let n be the number of points (rows) in X.
    Let t be the number of target dimensions into which we want to reduce X.
    Let k be about 14.
    
    // Training phase
    Find the k-nearest neighbors of each point in X.
    Measure the distance to each neighbor.
    Apply the Floyd-Warshall algorithm to compute the "geodesic"
    	distances between all non-neighboring points.
    
    // Reducing phase
    Let Y be a set of n points, each in t-dimensional space.
    Initialize Y with small random values.
    Apply MDS to Y. (Use the geodesic distances as the target distances.)
    
    Geodesic distance refers to the shortest path hopping from neighbor to neighbor between two points, as computed by the Floyd-Warshall algorithm. (Geodesic distance is usually longer than Euclidean distance, which hops directly between the two points in one shot.)


  6. Implement a simplified version of Maximum Variance Unfolding (MVU). Here is some pseudocode:
    Let X be a set of points in high-dimensional space that we want to reduce.
    Let n be the number of points (rows) in X.
    Let t be the target number of dimensions into which we want to reduce X.
    Let k be about 14.
    
    // Training phase
    Find the k-nearest neighbors of each point in X.
    Measure the distance to each neighbor.
    Let d(a,b) be the original distance between points a and b.
    
    // Reducing phase
    Do until convergence is detected:
    	Multiply all the values in X by some scaling factor, such as 1.05.
    	Do 100*k*n times:
    		Pick a random point, a, from X.
    		Pick one of a's neighbors, b.
    		Let c(a,b) be the current distance between a and b.
    		Adjust a and b minimally and equally,
    		to make c(a,b)=d(a,b).
    Apply PCA to reduce X into t dimensions.
    


  7. Download this Swiss roll data. Reduce it into 2 dimensions using PCA, Isomap, and MVU. Plot each of the results. (Your plot will look nicer if you use color to indicate the row index of each point, but this is not required.)

    The original data represents points on a swiss roll manifold, shown here from 9 different perspectives:


    The ideal solution should look something like this:


  8. Download this Crane data. Reduce it into 2 dimensions using PCA and Isomap. Plot each of the results.

    Each row in the original data represents a ray-traced image of a crane sampled from a space like this:


    The ideal solution should look something like this:


  9. Make a PDF document named plots.pdf containing your 5 plots. Label them, so I can tell which plot is which. (You do not need to write a report. Just show me your 5 plots.) Zip it up with your source code and submit it in the usual manner.