Back to the table of contents

Previous      Next

waffles_generate

A command-line tool to help generate various types of data. (Most of the datasets it generates are for testing manifold learning algorithms. I add them as I need them.) Here's the usage information:

Full Usage Information
[Square brackets] are used to indicate required arguments.
<Angled brackets> are used to indicate optional arguments.

waffles_generate [command]
   Generate certain useful datasets
   3d [dataset] <options>
      Make a 3d scatter plot. Points are colored with a spectrum according to
      their order in the dataset.
      [dataset]
         The filename of a dataset to plot. It must have exactly 3 continuous
         attributes.
      <options>
         -blast
            Produce a 5-by-5 grid of renderings, each time using a random point
            of view. It will print the random camera directions that it selects
            to stdout.
         -seed [value]
            Specify a seed for the random number generator.
         -size [width] [height]
            Sets the size of the image. The default is 1000 1000.
         -pointradius [radius]
            Set the size of the points. The default is 40.0.
         -bgcolor [color]
            Set the background color. If not specified, the default is ffffff.
         -cameradistance [dist]
            Set the distance between the camera and the mean of the data. This
            value is specified as a factor, which is multiplied by the distance
            between the min and max corners of the data. If not specified, the
            default is 1.5. (If the camera is too close to the data, make this
            value bigger.)
         -cameradirection [dx] [dy] [dz]
            Specifies the direction from the camera to the mean of the data.
            (The camera always looks at the mean.) The default is 0.6 -0.3
            -0.8.
         -out [filename]
            Specify the name of the output file. (The default is plot.png.) It
            should have the .png extension because other image formats are not
            yet supported.
         -nolabels
            Don't put axis labels on the bounding box.
         -nobox
            Don't draw a bounding box around the plot.
   crane <options>
      Generate a dataset where each row represents a ray-traced image of a
      crane with a ball.
      <options>
         -saveimage [filename]
            Save an image showing all the frames.
         -ballradius [size]
            Specify the size of the ball. The default is 0.3.
         -frames [horiz] [vert]
            Specify the number of frames to render.
         -size [wid] [hgt]
            Specify the size of each frame.
         -blur [radius]
            Blurs the images. A good starting value might be 5.0.
         -gray
            Use a single grayscale value for every pixel instead of three (red,
            green, blue) channel values.
   cube [n]
      returns data evenly distributed on the surface of a unit cube. Each side
      is sampled with [n]x[n] points. The total number of points in the dataset
      will be 6*[n]*[n]-12*[n]+8.
   entwinedspirals [points] <options>
      Generates points that lie on an entwined spirals manifold.
      [points]
         The number of points with which to sample the manifold.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -reduced
            Generate intrinsic values instead of extrinsic values. (This might
            be useful to empirically measure the accuracy of a manifold
            learner.)
   fishbowl [n] <option>
      Generate samples on the surface of a fish-bowl manifold.
      [n]
         The number of samples to draw.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -opening [size]
            the size of the opening. (0.0 = no opening. 0.25 = default. 1.0 =
            half of the sphere.)
   gridrandomwalk [arff-file] [width] [samples] <options>
      Generate a sequence of action-observation pairs by randomly walking
      around on a grid of observation vectors. Assumes there are four possible
      actions consisting of up, down, left, right.
      [arff-file]
         The filename of an arff file containing observation vectors arranged
         in a grid.
      [width]
         The width of the grid.
      [samples]
         The number of samples to take. In other words, the length of the
         random walk.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -start [x] [y]
            Specifies the starting state. The default is to start in the center
            of the grid.
         -obsfile [filename]
            Specify the filename for the observation sequence data. The default
            is observations.arff.
         -actionfile [filename]
            Specify the filename for the actions data. The default is
            actions.arff.
   imagestoarff <options>
      Converts the numbered PNG image files in the current directory to an ARFF
      file, where each row encodes one image. (For example, you might use
      ffmpeg to convert a video to a sequence of images, and then use this tool
      to convert the images to an ARFF file.)
      <options>
         -inc [val]
            Specify how the frame numbering is incremented. By default, each
            frame increments the number by 1.
         -start [val]
            Specify the lowest number of any image. For example, if the first
            frame is named frame001.png, then this value should be 1. (Note
            that it will stop as soon as it cannot find the next frame, so if
            this starting value is incorrect, there may be no results.)
         -pre [str]
            Specify the prefix that comes before the number in the filenames.
            For example, if frame001.png is the name of one of the files, then
            the prefix should be "frame".
         -suf [str]
            Specify the suffix that comes after the number in the filenames.
            For example, if f001yo.png is the name of one of the files, then
            the suffix should be "yo.png".
         -digits [n]
            Specify the number of digits (including zero padding) used in the
            frame numbering. For example, if frame00001.png is one of the
            frames, then this value should be 5.
         -channels [c]
            Specify the number of color channels to encode. For grayscale, use
            1. For RGB, use 3.
         -range [r]
            Specify the largest possible value in the encoding of a single
            channel. Typical values are 1.0 and 255.0
   imagetranslatedovernoise [png-file] <options>
      Sample a manifold by translating an image over a background of noise.
      [png-file]
         The filename of a png image.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -reduced
            Generate intrinsic values instead of extrinsic values. (This might
            be useful to empirically measure the accuracy of a manifold
            learner.)
   manifold [samples] <options> [equations]
      Generate sample points randomly distributed on the surface of a manifold.
      [samples]
         The number of points with which to sample the manifold
      <options>
         -seed [value]
            Specify a seed for the random number generator.
      [equations]
         A set of equations that define the manifold. The equations that define
         the manifold must be named y1, y2, ..., but helper equations may be
         included. The manifold-defining equations must all have the same
         number of parameters. The parameters will be drawn from a standard
         normal distribution (from 0 to 1). Usually it is a good idea to wrap
         the equations in quotes. Example:
         "y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x
         1)"
   map [in] [equations]
      Map a dataset using the specified equations.
      [in]
         The input dataset.
      [equations]
         A set of equations that define the mapping. The number of input values
         should match the number of columns in the input data. The number of
         equations will determine the number of columns in the output data. The
         equations that define the manifold must be named y1, y2, ..., but
         helper equations may be included.Usually it is a good idea to wrap the
         equations in quotes. Example:
         "y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x
         1)"
   model [model-file] [dataset] [attr-x] [attr-y] <options>
      Plot the model space of a trained supervised learning algorithm.
      [model-file]
         The filename of the trained model. (You can use "waffles_learn train"
         to make a model file.)
      [dataset]
         The filename of a dataset to be plotted. It can be the training set
         that was used to train the model, or a test set that it hasn't yet
         seen.
      [attr-x]
         The zero-based index of a continuous feature attributes for the
         horizontal axis.
      [attr-y]
         The zero-based index of a continuous feature attributes for the
         vertical axis.
      <options>
         -out [filename]
            Specify the name of the output file. (The default is plot.png.) It
            should have the .png extension because other image formats are not
            yet supported.
         -size [width] [height]
            Specify the size of the image.
         -pointradius [size]
            Specify the size of the dots used to represent each instance.
   noise [rows] <options>
      Generate random data by sampling from a distribution.
      [rows]
         The number of patterns to generate.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -dist [distribution]
            Specify the distribution. The default is normal 0 1
            beta [alpha] [beta]
            binomial [n] [p]
            categorical 3 [p0] [p1] [p2]
               A categorical distribution with 3 classes. [p0], [p1], and [p2]
               specify the probabilities of each of the 3 classes. (This is
               just an example. Other values besides 3 may be used for the
               number of classes.)
            cauchy [median] [scale]
            chisquare [t]
            exponential [beta]
            f [t] [u]
            gamma [alpha] [beta]
            gaussian [mean] [deviation]
            geometric [p]
            logistic [mu] [s]
            lognormal [mu] [sigma]
            normal [mean] [deviation]
            poisson [mu]
            softimpulse [s]
            spherical [dims] [radius]
            student [t]
            uniform [a] [b]
            weibull [gamma]
   overview [dataset]
      Generate a matrix of plots of attribute distributions and correlations.
      This is a useful chart for becoming acquainted with a dataset.
      [dataset]
         The filename of a dataset to be charted.
      <options>
         -out [filename]
            Specify the name of the output file. (The default is plot.png.) It
            should have the .png extension because other image formats are not
            yet supported.
         -cellsize [value]
            Change the size of each cell. The default is 100.
         -jitter [value]
            Specify how much to jitter the plotted points. The default is 0.03.
         -maxattrs [value]
            Specifies the maximum number of attributes to plot. The default is
            20.
   randomsequence [length] <options>
      Generates a sequential list of integer values, shuffles them randomly,
      and then prints the shuffled list to stdout.
      [length]
         The number of values in the random sequence.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -start [value]
            Specify the smallest value in the sequence.
   randomwalk [samples] <options>
      Perform a random walk to generate the specified number of sample points.
      The walk is bounded within the unit cube (or hypercube). If you need a
      random walk with different bounds, you can use the map tool to convert it
      to the desired bounds.
      [samples]
         The number of samples to take.
      <options>
         -seed [n]
            Specify a seed for the random number generator.
         -dims [n]
            Specify the number of dimensions in the samples.
         -stepscale [dim] [scale]
            Scale the steps in a particular dimension by a scalar. For example,
            if you plan to map the samples to a wider space, you may want to
            make horizontal steps smaller so that they will be uniform after
            the mapping. This option may be used any number of times.
            [dim]
               The dimension to scale.
            [scale]
               The amount to scale the steps in that dimension.
         -start [val]
            A scalar to initialize all dimensions of the starting position. For
            example, if there are 3 sample dimensions, and this value is 0.2,
            then the starting point will be <0.2,0.2,0.2>.
         -continuous
            Specify to use continuous actions. That is, steps will move in
            arbitrary directions instead of axis-aligned directions.
         -step [size]
            Specify the step size. (Note that the -stepscale option is applied
            to scale the step size in particular dimensions after this option
            is applied.)
         -delib [d]
            Specify a deliberateness factor between 0 and 1. A value of 0 will
            make every step completely random, and independent of previous
            steps. A value close to 1 will travel in a straight line until a
            boundary is encountered, then choose a new random direction. Values
            in between will do something in between.
         -actions [filename]
            Generate a dataset containing a representation of the action
            performed at each step. Note that the first action is performed
            AFTER the first sample is taken. No sample is taken after the last
            action is performed.
         -perturb [dev]
            Perturb the sample with Gaussian noise. Note that this noise is
            additive, in that the next step will be taken from the perturbed
            location. If you want non-additive noise, you can postprocess the
            data with the waffles_transform addnoise tool.
   scalerotate [png-file] <options>
      Generate a dataset where each row represents an image that has been
      scaled and rotated by various amounts. Thus, these images form an
      open-cylinder (although somewhat cone-shaped) manifold.
      [png-file]
         The filename of a PNG image
      <options>
         -saveimage [filename]
            Save a composite image showing all the frames in a grid.
         -frames [rotate-frames] [scale-frames]
            Specify the number of frames. The default is 40 15.
         -arc [radians]
            Specify the rotation amount. If not specified, the default is
            6.2831853... (2*PI).
   scurve [points] <options>
      Generate points that lie on an s-curve manifold.
      [points]
         The number of points with which to sample the manifold
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -reduced
            Generate intrinsic values instead of extrinsic values. (This might
            be useful to empirically measure the accuracy of a manifold
            learner.)
   selfintersectingribbon [points] <options>
      Generate points that lie on a self-intersecting ribbon manifold.
      [points]
         The number of points with which to sample the manifold.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
   swissroll [points] <options>
      Generate points that lie on a swiss roll manifold.
      [points]
         The number of points with which to sample the manifold.
      <options>
         -seed [value]
            Specify a seed for the random number generator.
         -reduced
            Generate intrinsic values instead of extrinsic values. (This might
            be useful to empirically measure the accuracy of a manifold
            learner.)
         -cutoutstar
            Don't sample within a star-shaped region on the manifold.
   windowedimage [png-file] <options>
      Sample a manifold by translating a window over an image. Each pattern
      represents the windowed portion of the image.
      [png-file]
         The filename of the png image from which to generate the data.
      <options>
         -reduced
            Generate intrinsic values instead of extrinsic values. (This might
            be useful to empirically measure the accuracy of a manifold
            learner.)
         -stepsizes [horiz] [vert]
            Specify the horizontal and vertical step sizes. (how many pixels to
            move the window between samples.)
         -windowsize [width] [height]
            Specify the size of the window. The default is half the width and
            height of [png-file].
   usage
      Print usage information.

Previous      Next

Back to the table of contents