Back to the table of contents Previous Next waffles_dimredA command-line tool for dimensionality reduction, manifold learning, attribute selection, and tools related to NLDR. Here's the usage information: Full Usage Information [Square brackets] are used to indicate required arguments. <Angled brackets> are used to indicate optional arguments. waffles_dimred [command] Reduce dimensionality, attribute selection, operations related to manifold learning, NLDR, etc. attributeselector [dataset] <data_opts> <options> Make a ranked list of attributes from most to least salient. The ranked list is printed to stdout. Attributes are zero-indexed. [dataset] The filename of a dataset. <data_opts> -labels [attr_list] Specify which attributes to use as labels. (If not specified, the default is to use the last attribute for the label.) [attr_list] is a comma-separated list of zero-indexed columns. A hypen may be used to specify a range of columns. A '*' preceding a value means to index from the right instead of the left. For example, "0,2-5" refers to columns 0, 2, 3, 4, and 5. "*0" refers to the last column. "0-*1" refers to all but the last column. -ignore [attr_list] Specify attributes to ignore. [attr_list] is a comma-separated list of zero-indexed columns. A hypen may be used to specify a range of columns. A '*' preceding a value means to index from the right instead of the left. For example, "0,2-5" refers to columns 0, 2, 3, 4, and 5. "*0" refers to the last column. "0-*1" refers to all but the last column. <options> -out [n] [filename] Save a dataset containing only the [n]-most salient features to [filename]. -seed [value] Specify a seed for the random number generator. -labeldims [n] Specify the number of dimensions in the label (output) vector. The default is 1. (Don't confuse this with the number of class labels. It only takes one dimension to specify a class label, even if there are k possible labels.) blendembeddings [data-orig] [neighbor-count] [neighbor-finder] [data-a] [data-b] <options> Compute a blended "average" embedding from two reduced-dimensionality embeddings of some data. [data-orig] The filename of the original high-dimensional data. [neighbor-count] The number of neighbors to use. [data-a] The first reduced dimensional embedding of [data-orig] [data-b] The second reduced dimensional embedding of [data-orig] <options> -seed [value] Specify a seed for the random number generator. breadthfirstunfolding [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options> A manifold learning algorithm. <options> -seed [value] Specify a seed for the random number generator. -reps [n] The number of times to compute the embedding and blend the results together. If not specified, the default is 1. [dataset] The filename of the high-dimensional data to reduce. [neighbor-count] The number of neighbors to use. [target_dims] The number of dimensions to reduce the data into. isomap [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options> Use the Isomap algorithm to reduce dimensionality. <options> -seed [value] Specify a seed for the random number generator. -tolerant If there are points that are disconnected from the rest of the graph, just drop them from the data. (This may cause the results to contain fewer rows than the input.) [dataset] The filename of the high-dimensional data to reduce. [neighbor-count] The number of neighbors to use. [target_dims] The number of dimensions to reduce the data into. scalingunfolder [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options> Use the ScalingUnfolder algorithm to reduce dimensionality. (This algorithm was inspired by Maximum Variance Unfolding (MVU). It iteratively scales up the data, then restores distances in local neighborhoods. Unlike MVU, however, it does not use semidefinite programming.) <options> -seed [value] Specify a seed for the random number generator. [dataset] The filename of the high-dimensional data to reduce. [neighbor-count] The number of neighbors to use. [target_dims] The number of dimensions to reduce the data into. som [dataset] [dimensions] <options> Give the output of a Kohonen self-organizing map with the given dimensions trained on the input dataset. Ex: "som foo 10 11" would train a 10x11 map on the input data and then give its 2D output for each of the input points as a row in the output file. [dataset] The filename of a .arff file to be transformed. [dimensions] A list of integers, one for each dimension of the map being created, giving the number of nodes in that dimension. <options> -tofile [filename] Write the trained map to the given filename -fromfile [filename] Read a map from the file rather than training it -seed [integer] Seed the random number generator with integer to obtain reproducible results -neighborhood [gaussian|uniform] Use the specified neighborhood type to determine the influence of a node on its neighbors. -printMeshEvery [numIter] [baseFilename] [xDim] [yDim] <showTrain> Print a 2D-Mesh visualization every numIter training iteratons to an svg file generated from baseFilename. The x dimension and y dimension will be chosen from the zero-indexed dimensions of the input using xDim and yDim. If the option "showTrain" is present then the training data is displayed along with the mesh. Ex. "-printMeshEvery 2 foo 0 1 showTrain" will write foo_01.svg foo_02.svg etc. every other iteration using the first two dimensions of the input and also display the training data in the svg image. Note that including this option twice will create two different printing actions, allowing multiple dimension pairs to be visualized at once. -batchTrain [startWidth] [endWidth] [numEpochs] [numConverge] Trains the network using the batch training algorithm. Neighborhood decreases exponentially from startWidth to endWidth over numEpochs epochs. Each epoch lasts at most numConverge passes through the dataset waiting for the network to converge. Do not ignore numConverge=1. There has been good performance with this on some datasets. This is the default training algorithm. -stdTrain [startWidth] [endWidth] [startRate] [endRate] [numIter] Trains the network using the standard incremental training algorithm with the network width decreasing exponentially from startWidth to endWidth and the learning rate also decreasing exponentially from startRate to endRate, this will happen in exactly numIter data point presentations. svd [matrix] <options> Compute the singular value decomposition of a matrix. [matrix] The filename of the matrix. <options> -ufilename [filename] Set the filename to which U will be saved. U is the matrix in which the columns are the eigenvectors of [matrix] times its transpose. The default is u.arff. -sigmafilename [filename] Set the filename to which Sigma will be saved. Sigma is the matrix that contains the singular values on its diagonal. All values in Sigma except the diagonal will be zero. If this option is not specified, the default is to only print the diagonal values (not the whole matrix) to stdout. If this options is specified, nothing is printed to stdout. -vfilename [filename] Set the filename to which V will be saved. V is the matrix in which the row are the eigenvectors of the transpose of [matrix] times [matrix]. The default is v.arff. -maxiters [n] Specify the number of times to iterate before giving up. The default is 100, which should be sufficient for most problems. lle [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options> Use the LLE algorithm to reduce dimensionality. <options> -seed [value] Specify a seed for the random number generator. [dataset] The filename of the high-dimensional data to reduce. [neighbor-count] The number of neighbors to use. [target_dims] The number of dimensions to reduce the data into. manifoldsculpting [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options> Use the Manifold Sculpting algorithm to reduce dimensionality. (This algorithm is specified in Gashler, Michael S. and Ventura, Dan and Martinez, Tony. Iterative non-linear dimensionality reduction with manifold sculpting. In Advances in Neural Information Processing Systems 20, pages 513-520, MIT Press, Cambridge, MA, 2008.) [dataset] The filename of the high-dimensional data to reduce. [neighbor-count] The number of neighbors to use. [target_dims] The number of dimensions to reduce the data into. <options> -seed [value] Specify a seed for the random number generator. -continue [dataset] Continue refining the specified reduced-dimensional results. (This feature enables Manifold Sculpting to improve upon its own results, or to refine the results from another dimensionality reduction algorithm.) -scalerate [value] Specify the scaling rate. If not specified, the default is 0.999. A value close to 1 will give better results, but will cause the algorithm to take longer. multidimensionalscaling [distance-matrix] [target-dims] Perform MDS on the specified [distance-matrix]. [distance-matrix] The filename of an arff file that contains the pair-wise distances (or dissimilarities) between every pair of points. It must be a square matrix of real values. Only the upper-triangle of this matrix is actually used. The lower-triangle and diagonal is ignored. <options> -squareddistances The distances in the distance matrix are squared distances, instead of just distances. pca [dataset] [target_dims] <options> Projects the data into the specified number of dimensions with principle component analysis. (Prints results to stdout. The input file is not modified.) <options> -seed [value] Specify a seed for the random number generator. -roundtrip [filename] Do a lossy round-trip of the data and save the results to the specified file. -eigenvalues [filename] Save the eigenvalues to the specified file. -components [filename] Save the centroid and principal component vectors (in order of decreasing corresponding eigenvalue) to the specified file. -aboutorigin Compute the principal components about the origin. (The default is to compute them relative to the centroid.) -modelin [filename] Load the PCA model from a json file. -modelout [filename] Save the trained PCA model to a json file. [dataset] The filename of the high-dimensional data to reduce. [target_dims] The number of dimensions to reduce the data into. usage Print usage information. Previous Next Back to the table of contents |