Back to the table of contents

Previous      Next

waffles_dimred

A command-line tool for dimensionality reduction, manifold learning, attribute selection, and tools related to NLDR. Here's the usage information:

Full Usage Information
[Square brackets] are used to indicate required arguments.
<Angled brackets> are used to indicate optional arguments.

waffles_dimred [command]
   Reduce dimensionality, attribute selection, operations related to manifold
   learning, NLDR, etc.
   attributeselector [dataset] <data_opts> <options>
      Make a ranked list of attributes from most to least salient. The ranked
      list is printed to stdout. Attributes are zero-indexed.
      [dataset]
         The filename of a dataset.
      <data_opts>
         -labels [attr_list]
            Specify which attributes to use as labels. (If not specified, the
            default is to use the last attribute for the label.) [attr_list] is
            a comma-separated list of zero-indexed columns. A hypen may be used
            to specify a range of columns.  A '*' preceding a value means to
            index from the right instead of the left. For example, "0,2-5"
            refers to columns 0, 2, 3, 4, and 5. "*0" refers to the last
            column. "0-*1" refers to all but the last column.
         -ignore [attr_list]
            Specify attributes to ignore. [attr_list] is a comma-separated list
            of zero-indexed columns. A hypen may be used to specify a range of
            columns.  A '*' preceding a value means to index from the right
            instead of the left. For example, "0,2-5" refers to columns 0, 2,
            3, 4, and 5. "*0" refers to the last column. "0-*1" refers to all
            but the last column.
      <options>
         -out [n] [filename]
            Save a dataset containing only the [n]-most salient features to
            [filename].
         -seed [value]
            Specify a seed for the random number generator.
         -labeldims [n]
            Specify the number of dimensions in the label (output) vector. The
            default is 1. (Don't confuse this with the number of class labels.
            It only takes one dimension to specify a class label, even if there
            are k possible labels.)
   blendembeddings [data-orig] [neighbor-count] [neighbor-finder] [data-a] [data-b] <options>
      Compute a blended "average" embedding from two reduced-dimensionality
      embeddings of some data.
      [data-orig]
         The filename of the original high-dimensional data.
      [neighbor-count]
         The number of neighbors to use.
      [data-a]
         The first reduced dimensional embedding of [data-orig]
      [data-b]
         The second reduced dimensional embedding of [data-orig]
      <options>
         -seed [value]
            Specify a seed for the random number generator.
   breadthfirstunfolding [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options>
      A manifold learning algorithm.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -reps [n]
            The number of times to compute the embedding and blend the results
            together. If not specified, the default is 1.
      [dataset]
         The filename of the high-dimensional data to reduce.
      [neighbor-count]
         The number of neighbors to use.
      [target_dims]
         The number of dimensions to reduce the data into.
   isomap [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options>
      Use the Isomap algorithm to reduce dimensionality.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -tolerant
            If there are points that are disconnected from the rest of the
            graph, just drop them from the data. (This may cause the results to
            contain fewer rows than the input.)
      [dataset]
         The filename of the high-dimensional data to reduce.
      [neighbor-count]
         The number of neighbors to use.
      [target_dims]
         The number of dimensions to reduce the data into.
   scalingunfolder [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options>
      Use the ScalingUnfolder algorithm to reduce dimensionality. (This
      algorithm was inspired by Maximum Variance Unfolding (MVU). It
      iteratively scales up the data, then restores distances in local
      neighborhoods. Unlike MVU, however, it does not use semidefinite
      programming.)
      <options>
         -seed [value]
            Specify a seed for the random number generator.
      [dataset]
         The filename of the high-dimensional data to reduce.
      [neighbor-count]
         The number of neighbors to use.
      [target_dims]
         The number of dimensions to reduce the data into.
   som [dataset] [dimensions] <options>
      Give the output of a Kohonen self-organizing map with the given
      dimensions trained on the input dataset.  Ex: "som foo 10 11" would train
      a 10x11 map on the input data and then give its 2D output for each of the
      input points as a row in the output file.
      [dataset]
         The filename of a .arff file to be transformed.
      [dimensions]
         A list of integers, one for each dimension of the map being created,
         giving the number of nodes in that dimension.
      <options>
         -tofile [filename]
            Write the trained map to the given filename
         -fromfile [filename]
            Read a map from the file rather than training it
         -seed [integer]
            Seed the random number generator with integer to obtain
            reproducible results
         -neighborhood [gaussian|uniform]
            Use the specified neighborhood type to determine the influence of a
            node on its neighbors.
         -printMeshEvery [numIter] [baseFilename] [xDim] [yDim] <showTrain>
            Print a 2D-Mesh visualization every numIter training iteratons to
            an svg file generated from baseFilename. The x dimension and y
            dimension will be chosen from the zero-indexed dimensions of the
            input using xDim and yDim.  If the option "showTrain" is present
            then the training data is displayed along with the mesh. Ex.
            "-printMeshEvery 2 foo 0 1 showTrain" will write foo_01.svg
            foo_02.svg etc. every other iteration using the first two
            dimensions of the input and also display the training data in the
            svg image. Note that including this option twice will create two
            different printing actions, allowing multiple dimension pairs to be
            visualized at once.
         -batchTrain [startWidth] [endWidth] [numEpochs] [numConverge]
            Trains the network using the batch training algorithm. Neighborhood
            decreases exponentially from startWidth to endWidth over numEpochs
            epochs.  Each epoch lasts at most numConverge passes through the
            dataset waiting for the network to converge.  Do not ignore
            numConverge=1.  There has been good performance with this on some
            datasets.  This is the default training algorithm.
         -stdTrain [startWidth] [endWidth] [startRate] [endRate] [numIter]
            Trains the network using the standard incremental training
            algorithm with the network width decreasing exponentially from
            startWidth to endWidth and the learning rate also decreasing
            exponentially from startRate to endRate, this will happen in
            exactly numIter data point presentations.
   svd [matrix] <options>
      Compute the singular value decomposition of a matrix.
      [matrix]
         The filename of the matrix.
      <options>
         -ufilename [filename]
            Set the filename to which U will be saved. U is the matrix in which
            the columns are the eigenvectors of [matrix] times its transpose.
            The default is u.arff.
         -sigmafilename [filename]
            Set the filename to which Sigma will be saved. Sigma is the matrix
            that contains the singular values on its diagonal. All values in
            Sigma except the diagonal will be zero. If this option is not
            specified, the default is to only print the diagonal values (not
            the whole matrix) to stdout. If this options is specified, nothing
            is printed to stdout.
         -vfilename [filename]
            Set the filename to which V will be saved. V is the matrix in which
            the row are the eigenvectors of the transpose of [matrix] times
            [matrix]. The default is v.arff.
         -maxiters [n]
            Specify the number of times to iterate before giving up. The
            default is 100, which should be sufficient for most problems.
   lle [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options>
      Use the LLE algorithm to reduce dimensionality.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
      [dataset]
         The filename of the high-dimensional data to reduce.
      [neighbor-count]
         The number of neighbors to use.
      [target_dims]
         The number of dimensions to reduce the data into.
   manifoldsculpting [dataset] [neighbor-count] [neighbor-finder] [target_dims] <options>
      Use the Manifold Sculpting algorithm to reduce dimensionality. (This
      algorithm is specified in Gashler, Michael S. and Ventura, Dan and
      Martinez, Tony. Iterative non-linear dimensionality reduction with
      manifold sculpting. In Advances in Neural Information Processing Systems
      20, pages 513-520, MIT Press, Cambridge, MA, 2008.)
      [dataset]
         The filename of the high-dimensional data to reduce.
      [neighbor-count]
         The number of neighbors to use.
      [target_dims]
         The number of dimensions to reduce the data into.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -continue [dataset]
            Continue refining the specified reduced-dimensional results. (This
            feature enables Manifold Sculpting to improve upon its own results,
            or to refine the results from another dimensionality reduction
            algorithm.)
         -scalerate [value]
            Specify the scaling rate. If not specified, the default is 0.999. A
            value close to 1 will give better results, but will cause the
            algorithm to take longer.
   multidimensionalscaling [distance-matrix] [target-dims]
      Perform MDS on the specified [distance-matrix].
      [distance-matrix]
         The filename of an arff file that contains the pair-wise distances (or
         dissimilarities) between every pair of points. It must be a square
         matrix of real values. Only the upper-triangle of this matrix is
         actually used. The lower-triangle and diagonal is ignored.
      <options>
         -squareddistances
            The distances in the distance matrix are squared distances, instead
            of just distances.
   pca [dataset] [target_dims] <options>
      Projects the data into the specified number of dimensions with principle
      component analysis. (Prints results to stdout. The input file is not
      modified.)
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -roundtrip [filename]
            Do a lossy round-trip of the data and save the results to the
            specified file.
         -eigenvalues [filename]
            Save the eigenvalues to the specified file.
         -components [filename]
            Save the centroid and principal component vectors (in order of
            decreasing corresponding eigenvalue) to the specified file.
         -aboutorigin
            Compute the principal components about the origin. (The default is
            to compute them relative to the centroid.)
         -modelin [filename]
            Load the PCA model from a json file.
         -modelout [filename]
            Save the trained PCA model to a json file.
      [dataset]
         The filename of the high-dimensional data to reduce.
      [target_dims]
         The number of dimensions to reduce the data into.
   usage
      Print usage information.

Previous      Next

Back to the table of contents