Back to the table of contents Previous Next waffles_generateA command-line tool to help generate various types of data. (Most of the datasets it generates are for testing manifold learning algorithms. I add them as I need them.) Here's the usage information: Full Usage Information [Square brackets] are used to indicate required arguments. <Angled brackets> are used to indicate optional arguments. waffles_generate [command] Generate certain useful datasets 3d [dataset] <options> Make a 3d scatter plot. Points are colored with a spectrum according to their order in the dataset. [dataset] The filename of a dataset to plot. It must have exactly 3 continuous attributes. <options> -blast Produce a 5-by-5 grid of renderings, each time using a random point of view. It will print the random camera directions that it selects to stdout. -seed [value] Specify a seed for the random number generator. -size [width] [height] Sets the size of the image. The default is 1000 1000. -pointradius [radius] Set the size of the points. The default is 40.0. -bgcolor [color] Set the background color. If not specified, the default is ffffff. -cameradistance [dist] Set the distance between the camera and the mean of the data. This value is specified as a factor, which is multiplied by the distance between the min and max corners of the data. If not specified, the default is 1.5. (If the camera is too close to the data, make this value bigger.) -cameradirection [dx] [dy] [dz] Specifies the direction from the camera to the mean of the data. (The camera always looks at the mean.) The default is 0.6 -0.3 -0.8. -out [filename] Specify the name of the output file. (The default is plot.png.) It should have the .png extension because other image formats are not yet supported. -nolabels Don't put axis labels on the bounding box. -nobox Don't draw a bounding box around the plot. crane <options> Generate a dataset where each row represents a ray-traced image of a crane with a ball. <options> -saveimage [filename] Save an image showing all the frames. -ballradius [size] Specify the size of the ball. The default is 0.3. -frames [horiz] [vert] Specify the number of frames to render. -size [wid] [hgt] Specify the size of each frame. -blur [radius] Blurs the images. A good starting value might be 5.0. -gray Use a single grayscale value for every pixel instead of three (red, green, blue) channel values. cube [n] returns data evenly distributed on the surface of a unit cube. Each side is sampled with [n]x[n] points. The total number of points in the dataset will be 6*[n]*[n]-12*[n]+8. entwinedspirals [points] <options> Generates points that lie on an entwined spirals manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) fishbowl [n] <option> Generate samples on the surface of a fish-bowl manifold. [n] The number of samples to draw. <options> -seed [value] Specify a seed for the random number generator. -opening [size] the size of the opening. (0.0 = no opening. 0.25 = default. 1.0 = half of the sphere.) gridrandomwalk [arff-file] [width] [samples] <options> Generate a sequence of action-observation pairs by randomly walking around on a grid of observation vectors. Assumes there are four possible actions consisting of up, down, left, right. [arff-file] The filename of an arff file containing observation vectors arranged in a grid. [width] The width of the grid. [samples] The number of samples to take. In other words, the length of the random walk. <options> -seed [value] Specify a seed for the random number generator. -start [x] [y] Specifies the starting state. The default is to start in the center of the grid. -obsfile [filename] Specify the filename for the observation sequence data. The default is observations.arff. -actionfile [filename] Specify the filename for the actions data. The default is actions.arff. imagestoarff <options> Converts the numbered PNG image files in the current directory to an ARFF file, where each row encodes one image. (For example, you might use ffmpeg to convert a video to a sequence of images, and then use this tool to convert the images to an ARFF file.) <options> -inc [val] Specify how the frame numbering is incremented. By default, each frame increments the number by 1. -start [val] Specify the lowest number of any image. For example, if the first frame is named frame001.png, then this value should be 1. (Note that it will stop as soon as it cannot find the next frame, so if this starting value is incorrect, there may be no results.) -pre [str] Specify the prefix that comes before the number in the filenames. For example, if frame001.png is the name of one of the files, then the prefix should be "frame". -suf [str] Specify the suffix that comes after the number in the filenames. For example, if f001yo.png is the name of one of the files, then the suffix should be "yo.png". -digits [n] Specify the number of digits (including zero padding) used in the frame numbering. For example, if frame00001.png is one of the frames, then this value should be 5. -channels [c] Specify the number of color channels to encode. For grayscale, use 1. For RGB, use 3. -range [r] Specify the largest possible value in the encoding of a single channel. Typical values are 1.0 and 255.0 imagetranslatedovernoise [png-file] <options> Sample a manifold by translating an image over a background of noise. [png-file] The filename of a png image. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) manifold [samples] <options> [equations] Generate sample points randomly distributed on the surface of a manifold. [samples] The number of points with which to sample the manifold <options> -seed [value] Specify a seed for the random number generator. [equations] A set of equations that define the manifold. The equations that define the manifold must be named y1, y2, ..., but helper equations may be included. The manifold-defining equations must all have the same number of parameters. The parameters will be drawn from a standard normal distribution (from 0 to 1). Usually it is a good idea to wrap the equations in quotes. Example: "y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x 1)" map [in] [equations] Map a dataset using the specified equations. [in] The input dataset. [equations] A set of equations that define the mapping. The number of input values should match the number of columns in the input data. The number of equations will determine the number of columns in the output data. The equations that define the manifold must be named y1, y2, ..., but helper equations may be included.Usually it is a good idea to wrap the equations in quotes. Example: "y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x 1)" model [model-file] [dataset] [attr-x] [attr-y] <options> Plot the model space of a trained supervised learning algorithm. [model-file] The filename of the trained model. (You can use "waffles_learn train" to make a model file.) [dataset] The filename of a dataset to be plotted. It can be the training set that was used to train the model, or a test set that it hasn't yet seen. [attr-x] The zero-based index of a continuous feature attributes for the horizontal axis. [attr-y] The zero-based index of a continuous feature attributes for the vertical axis. <options> -out [filename] Specify the name of the output file. (The default is plot.png.) It should have the .png extension because other image formats are not yet supported. -size [width] [height] Specify the size of the image. -pointradius [size] Specify the size of the dots used to represent each instance. noise [rows] <options> Generate random data by sampling from a distribution. [rows] The number of patterns to generate. <options> -seed [value] Specify a seed for the random number generator. -dist [distribution] Specify the distribution. The default is normal 0 1 beta [alpha] [beta] binomial [n] [p] categorical 3 [p0] [p1] [p2] A categorical distribution with 3 classes. [p0], [p1], and [p2] specify the probabilities of each of the 3 classes. (This is just an example. Other values besides 3 may be used for the number of classes.) cauchy [median] [scale] chisquare [t] exponential [beta] f [t] [u] gamma [alpha] [beta] gaussian [mean] [deviation] geometric [p] logistic [mu] [s] lognormal [mu] [sigma] normal [mean] [deviation] poisson [mu] softimpulse [s] spherical [dims] [radius] student [t] uniform [a] [b] weibull [gamma] overview [dataset] Generate a matrix of plots of attribute distributions and correlations. This is a useful chart for becoming acquainted with a dataset. [dataset] The filename of a dataset to be charted. <options> -out [filename] Specify the name of the output file. (The default is plot.png.) It should have the .png extension because other image formats are not yet supported. -cellsize [value] Change the size of each cell. The default is 100. -jitter [value] Specify how much to jitter the plotted points. The default is 0.03. -maxattrs [value] Specifies the maximum number of attributes to plot. The default is 20. randomsequence [length] <options> Generates a sequential list of integer values, shuffles them randomly, and then prints the shuffled list to stdout. [length] The number of values in the random sequence. <options> -seed [value] Specify a seed for the random number generator. -start [value] Specify the smallest value in the sequence. randomwalk [samples] <options> Perform a random walk to generate the specified number of sample points. The walk is bounded within the unit cube (or hypercube). If you need a random walk with different bounds, you can use the map tool to convert it to the desired bounds. [samples] The number of samples to take. <options> -seed [n] Specify a seed for the random number generator. -dims [n] Specify the number of dimensions in the samples. -stepscale [dim] [scale] Scale the steps in a particular dimension by a scalar. For example, if you plan to map the samples to a wider space, you may want to make horizontal steps smaller so that they will be uniform after the mapping. This option may be used any number of times. [dim] The dimension to scale. [scale] The amount to scale the steps in that dimension. -start [val] A scalar to initialize all dimensions of the starting position. For example, if there are 3 sample dimensions, and this value is 0.2, then the starting point will be <0.2,0.2,0.2>. -continuous Specify to use continuous actions. That is, steps will move in arbitrary directions instead of axis-aligned directions. -step [size] Specify the step size. (Note that the -stepscale option is applied to scale the step size in particular dimensions after this option is applied.) -delib [d] Specify a deliberateness factor between 0 and 1. A value of 0 will make every step completely random, and independent of previous steps. A value close to 1 will travel in a straight line until a boundary is encountered, then choose a new random direction. Values in between will do something in between. -actions [filename] Generate a dataset containing a representation of the action performed at each step. Note that the first action is performed AFTER the first sample is taken. No sample is taken after the last action is performed. -perturb [dev] Perturb the sample with Gaussian noise. Note that this noise is additive, in that the next step will be taken from the perturbed location. If you want non-additive noise, you can postprocess the data with the waffles_transform addnoise tool. scalerotate [png-file] <options> Generate a dataset where each row represents an image that has been scaled and rotated by various amounts. Thus, these images form an open-cylinder (although somewhat cone-shaped) manifold. [png-file] The filename of a PNG image <options> -saveimage [filename] Save a composite image showing all the frames in a grid. -frames [rotate-frames] [scale-frames] Specify the number of frames. The default is 40 15. -arc [radians] Specify the rotation amount. If not specified, the default is 6.2831853... (2*PI). scurve [points] <options> Generate points that lie on an s-curve manifold. [points] The number of points with which to sample the manifold <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) selfintersectingribbon [points] <options> Generate points that lie on a self-intersecting ribbon manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. swissroll [points] <options> Generate points that lie on a swiss roll manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) -cutoutstar Don't sample within a star-shaped region on the manifold. windowedimage [png-file] <options> Sample a manifold by translating a window over an image. Each pattern represents the windowed portion of the image. [png-file] The filename of the png image from which to generate the data. <options> -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) -stepsizes [horiz] [vert] Specify the horizontal and vertical step sizes. (how many pixels to move the window between samples.) -windowsize [width] [height] Specify the size of the window. The default is half the width and height of [png-file]. usage Print usage information. Previous Next Back to the table of contents |