Back to the table of contents

Previous      Next

Data formats

One of your first tasks may be to get your data into a format that our tools can operate on.

Converting to ARFF forrmat

Our preferred format for data is ARFF. This format is basically just a text file of comma-separated values, with a little bit of extra meta-data to give meaning to the data. Here is a simple example of some data in ARFF format:

	@RELATION mydata
	@ATTRIBUTE age continuous
	@ATTRIBUTE gender {male,female}
	@ATTRIBUTE hair {blonde,red,brown,black,none}
	@ATTRIBUTE weight continuous
	@DATA
	18, male,   red, 152
	84, male,   none, 138
	42, female, blonde, 168
	48, male,   black, 341
	5,  female, brown, 49
	24, female, red, 140

If your data is not in ARFF format, do not despair. We can work with other formats too. The following command will convert a simple text file of comma-separated values to ARFF format by automatically determining the meta-data.

	waffles_transform import mydata.csv > mydata.arff
If your data is separated by tabs, instead of spaces, we can handle that too.
	waffles_transform import mydata.csv -tabs > mydata.arff
or whitespace
	waffles_transform import mydata.csv -whitespace > mydata.arff
or semicolons
	waffles_transform import mydata.csv -semicolon > mydata.arff
etc.

Octave (or Matlab) example

Suppose you are familiar with Octave (or Matlab), but you want to use Waffles to do something with your data. Here's how you could do it. First, let's export your data, y, from Octave:

	save -ascii y.txt y
Next, we'll convert it to ARFF format:
	waffles_transform import y.txt -whitespace > y.arff
Now, use Waffles to do something with your data. There are many things you could do. Here is a random example:
	waffles_dimred breadthfirstunfolding y.arff 18 kdtree 2 -reps 20 > x.arff
Then, we'll convert the results back to the Octave format:
	waffles_transform export x.arff -tab > x.txt
Finally, go back into Octave and load your data:
	load x.txt

Manipulating data

Maybe you'll need to tweak your dataset a little bit. We provide tools to drop columns, swap columns, fill in missing values, sort in a particular column, shuffle rows, and numerous other useful transformations. Here are a few examples. Hopefully, the command itself is sufficiently clear to describe what it does.

	waffles_transform dropcolumns diabetes.arff 0,2-5,7
	waffles_transform swapcolumns mydata.arff 0 3
	waffles_transform fillmissingvalues mydata.arff
	waffles_transform sortcolumn mydata.arff 2
	waffles_transform shuffle mydata.arff
There are many other possible transformations that you can apply to your data. For a complete list, take a look at the usage information for the waffles_transform tool.


Previous      Next

Back to the table of contents