|
Back to the table of contents Previous Next waffles_generateA command-line tool to help generate various types of data. (Most of the datasets it generates are for testing manifold learning algorithms. I add them as I need them.) Here's the usage information:
Full Usage Information
[Square brackets] are used to indicate required arguments.
<Angled brackets> are used to indicate optional arguments.
waffles_generate [command]
Generate certain useful datasets
3d [dataset] <options>
Make a 3d scatter plot. Points are colored with a spectrum according to
their order in the dataset.
[dataset]
The filename of a dataset to plot. It must have exactly 3 continuous
attributes.
<options>
-blast
Produce a 5-by-5 grid of renderings, each time using a random point
of view. It will print the random camera directions that it selects
to stdout.
-seed [value]
Specify a seed for the random number generator.
-size [width] [height]
Sets the size of the image. The default is 1000 1000.
-pointradius [radius]
Set the size of the points. The default is 40.0.
-bgcolor [color]
Set the background color. If not specified, the default is ffffff.
-cameradistance [dist]
Set the distance between the camera and the mean of the data. This
value is specified as a factor, which is multiplied by the distance
between the min and max corners of the data. If not specified, the
default is 1.5. (If the camera is too close to the data, make this
value bigger.)
-cameradirection [dx] [dy] [dz]
Specifies the direction from the camera to the mean of the data.
(The camera always looks at the mean.) The default is 0.6 -0.3
-0.8.
-out [filename]
Specify the name of the output file. (The default is plot.png.) It
should have the .png extension because other image formats are not
yet supported.
-nolabels
Don't put axis labels on the bounding box.
-nobox
Don't draw a bounding box around the plot.
crane <options>
Generate a dataset where each row represents a ray-traced image of a
crane with a ball.
<options>
-saveimage [filename]
Save an image showing all the frames.
-ballradius [size]
Specify the size of the ball. The default is 0.3.
-frames [horiz] [vert]
Specify the number of frames to render.
-size [wid] [hgt]
Specify the size of each frame.
-blur [radius]
Blurs the images. A good starting value might be 5.0.
-gray
Use a single grayscale value for every pixel instead of three (red,
green, blue) channel values.
cube [n]
returns data evenly distributed on the surface of a unit cube. Each side
is sampled with [n]x[n] points. The total number of points in the dataset
will be 6*[n]*[n]-12*[n]+8.
entwinedspirals [points] <options>
Generates points that lie on an entwined spirals manifold.
[points]
The number of points with which to sample the manifold.
<options>
-seed [value]
Specify a seed for the random number generator.
-reduced
Generate intrinsic values instead of extrinsic values. (This might
be useful to empirically measure the accuracy of a manifold
learner.)
fishbowl [n] <option>
Generate samples on the surface of a fish-bowl manifold.
[n]
The number of samples to draw.
<options>
-seed [value]
Specify a seed for the random number generator.
-opening [size]
the size of the opening. (0.0 = no opening. 0.25 = default. 1.0 =
half of the sphere.)
gridrandomwalk [arff-file] [width] [samples] <options>
Generate a sequence of action-observation pairs by randomly walking
around on a grid of observation vectors. Assumes there are four possible
actions consisting of up, down, left, right.
[arff-file]
The filename of an arff file containing observation vectors arranged
in a grid.
[width]
The width of the grid.
[samples]
The number of samples to take. In other words, the length of the
random walk.
<options>
-seed [value]
Specify a seed for the random number generator.
-start [x] [y]
Specifies the starting state. The default is to start in the center
of the grid.
-obsfile [filename]
Specify the filename for the observation sequence data. The default
is observations.arff.
-actionfile [filename]
Specify the filename for the actions data. The default is
actions.arff.
imagestoarff <options>
Converts the numbered PNG image files in the current directory to an ARFF
file, where each row encodes one image. (For example, you might use
ffmpeg to convert a video to a sequence of images, and then use this tool
to convert the images to an ARFF file.)
<options>
-inc [val]
Specify how the frame numbering is incremented. By default, each
frame increments the number by 1.
-start [val]
Specify the lowest number of any image. For example, if the first
frame is named frame001.png, then this value should be 1. (Note
that it will stop as soon as it cannot find the next frame, so if
this starting value is incorrect, there may be no results.)
-pre [str]
Specify the prefix that comes before the number in the filenames.
For example, if frame001.png is the name of one of the files, then
the prefix should be "frame".
-suf [str]
Specify the suffix that comes after the number in the filenames.
For example, if f001yo.png is the name of one of the files, then
the suffix should be "yo.png".
-digits [n]
Specify the number of digits (including zero padding) used in the
frame numbering. For example, if frame00001.png is one of the
frames, then this value should be 5.
-channels [c]
Specify the number of color channels to encode. For grayscale, use
1. For RGB, use 3.
-range [r]
Specify the largest possible value in the encoding of a single
channel. Typical values are 1.0 and 255.0
imagetranslatedovernoise [png-file] <options>
Sample a manifold by translating an image over a background of noise.
[png-file]
The filename of a png image.
<options>
-seed [value]
Specify a seed for the random number generator.
-reduced
Generate intrinsic values instead of extrinsic values. (This might
be useful to empirically measure the accuracy of a manifold
learner.)
manifold [samples] <options> [equations]
Generate sample points randomly distributed on the surface of a manifold.
[samples]
The number of points with which to sample the manifold
<options>
-seed [value]
Specify a seed for the random number generator.
[equations]
A set of equations that define the manifold. The equations that define
the manifold must be named y1, y2, ..., but helper equations may be
included. The manifold-defining equations must all have the same
number of parameters. The parameters will be drawn from a standard
normal distribution (from 0 to 1). Usually it is a good idea to wrap
the equations in quotes. Example:
"y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x
1)"
map [in] [equations]
Map a dataset using the specified equations.
[in]
The input dataset.
[equations]
A set of equations that define the mapping. The number of input values
should match the number of columns in the input data. The number of
equations will determine the number of columns in the output data. The
equations that define the manifold must be named y1, y2, ..., but
helper equations may be included.Usually it is a good idea to wrap the
equations in quotes. Example:
"y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x
1)"
model [model-file] [dataset] [attr-x] [attr-y] <options>
Plot the model space of a trained supervised learning algorithm.
[model-file]
The filename of the trained model. (You can use "waffles_learn train"
to make a model file.)
[dataset]
The filename of a dataset to be plotted. It can be the training set
that was used to train the model, or a test set that it hasn't yet
seen.
[attr-x]
The zero-based index of a continuous feature attributes for the
horizontal axis.
[attr-y]
The zero-based index of a continuous feature attributes for the
vertical axis.
<options>
-out [filename]
Specify the name of the output file. (The default is plot.png.) It
should have the .png extension because other image formats are not
yet supported.
-size [width] [height]
Specify the size of the image.
-pointradius [size]
Specify the size of the dots used to represent each instance.
noise [rows] <options>
Generate random data by sampling from a distribution.
[rows]
The number of patterns to generate.
<options>
-seed [value]
Specify a seed for the random number generator.
-dist [distribution]
Specify the distribution. The default is normal 0 1
beta [alpha] [beta]
binomial [n] [p]
categorical 3 [p0] [p1] [p2]
A categorical distribution with 3 classes. [p0], [p1], and [p2]
specify the probabilities of each of the 3 classes. (This is
just an example. Other values besides 3 may be used for the
number of classes.)
cauchy [median] [scale]
chisquare [t]
exponential [beta]
f [t] [u]
gamma [alpha] [beta]
gaussian [mean] [deviation]
geometric [p]
logistic [mu] [s]
lognormal [mu] [sigma]
normal [mean] [deviation]
poisson [mu]
softimpulse [s]
spherical [dims] [radius]
student [t]
uniform [a] [b]
weibull [gamma]
overview [dataset]
Generate a matrix of plots of attribute distributions and correlations.
This is a useful chart for becoming acquainted with a dataset.
[dataset]
The filename of a dataset to be charted.
<options>
-out [filename]
Specify the name of the output file. (The default is plot.png.) It
should have the .png extension because other image formats are not
yet supported.
-cellsize [value]
Change the size of each cell. The default is 100.
-jitter [value]
Specify how much to jitter the plotted points. The default is 0.03.
-maxattrs [value]
Specifies the maximum number of attributes to plot. The default is
20.
randomsequence [length] <options>
Generates a sequential list of integer values, shuffles them randomly,
and then prints the shuffled list to stdout.
[length]
The number of values in the random sequence.
<options>
-seed [value]
Specify a seed for the random number generator.
-start [value]
Specify the smallest value in the sequence.
randomwalk [samples] <options>
Perform a random walk to generate the specified number of sample points.
The walk is bounded within the unit cube (or hypercube). If you need a
random walk with different bounds, you can use the map tool to convert it
to the desired bounds.
[samples]
The number of samples to take.
<options>
-seed [n]
Specify a seed for the random number generator.
-dims [n]
Specify the number of dimensions in the samples.
-stepscale [dim] [scale]
Scale the steps in a particular dimension by a scalar. For example,
if you plan to map the samples to a wider space, you may want to
make horizontal steps smaller so that they will be uniform after
the mapping. This option may be used any number of times.
[dim]
The dimension to scale.
[scale]
The amount to scale the steps in that dimension.
-start [val]
A scalar to initialize all dimensions of the starting position. For
example, if there are 3 sample dimensions, and this value is 0.2,
then the starting point will be <0.2,0.2,0.2>.
-continuous
Specify to use continuous actions. That is, steps will move in
arbitrary directions instead of axis-aligned directions.
-step [size]
Specify the step size. (Note that the -stepscale option is applied
to scale the step size in particular dimensions after this option
is applied.)
-delib [d]
Specify a deliberateness factor between 0 and 1. A value of 0 will
make every step completely random, and independent of previous
steps. A value close to 1 will travel in a straight line until a
boundary is encountered, then choose a new random direction. Values
in between will do something in between.
-actions [filename]
Generate a dataset containing a representation of the action
performed at each step. Note that the first action is performed
AFTER the first sample is taken. No sample is taken after the last
action is performed.
-perturb [dev]
Perturb the sample with Gaussian noise. Note that this noise is
additive, in that the next step will be taken from the perturbed
location. If you want non-additive noise, you can postprocess the
data with the waffles_transform addnoise tool.
scalerotate [png-file] <options>
Generate a dataset where each row represents an image that has been
scaled and rotated by various amounts. Thus, these images form an
open-cylinder (although somewhat cone-shaped) manifold.
[png-file]
The filename of a PNG image
<options>
-saveimage [filename]
Save a composite image showing all the frames in a grid.
-frames [rotate-frames] [scale-frames]
Specify the number of frames. The default is 40 15.
-arc [radians]
Specify the rotation amount. If not specified, the default is
6.2831853... (2*PI).
scurve [points] <options>
Generate points that lie on an s-curve manifold.
[points]
The number of points with which to sample the manifold
<options>
-seed [value]
Specify a seed for the random number generator.
-reduced
Generate intrinsic values instead of extrinsic values. (This might
be useful to empirically measure the accuracy of a manifold
learner.)
selfintersectingribbon [points] <options>
Generate points that lie on a self-intersecting ribbon manifold.
[points]
The number of points with which to sample the manifold.
<options>
-seed [value]
Specify a seed for the random number generator.
swissroll [points] <options>
Generate points that lie on a swiss roll manifold.
[points]
The number of points with which to sample the manifold.
<options>
-seed [value]
Specify a seed for the random number generator.
-reduced
Generate intrinsic values instead of extrinsic values. (This might
be useful to empirically measure the accuracy of a manifold
learner.)
-cutoutstar
Don't sample within a star-shaped region on the manifold.
windowedimage [png-file] <options>
Sample a manifold by translating a window over an image. Each pattern
represents the windowed portion of the image.
[png-file]
The filename of the png image from which to generate the data.
<options>
-reduced
Generate intrinsic values instead of extrinsic values. (This might
be useful to empirically measure the accuracy of a manifold
learner.)
-stepsizes [horiz] [vert]
Specify the horizontal and vertical step sizes. (how many pixels to
move the window between samples.)
-windowsize [width] [height]
Specify the size of the window. The default is half the width and
height of [png-file].
usage
Print usage information.
Previous Next Back to the table of contents |