Visualization
To visualize ROSETTA data, you have to export the relevant structures to a format some external visualization engine can read. For example, decision tables can be exported to Matlab, and indiscernibility graphs can be exported to a format understood by the GraphViz suite of graph layout programs.
Command Line Scripts
Command scripts are an invaluable tool for automation of tasks, especially in a cross-validation setting. Command scripts can be run both from within the ROSETTA GUI as well as from the command-line version of ROSETTA. A small example of how to do the latter is available.
Note that command scripts can be used for other purposes than cross-validation, too. For example, to automate data conversion:
% Test script, showing how one can convert
or to randomly split a table in two:
% from import format to internal format.
% Create an empty table...
StructureCreator
{OUTPUT = DecisionTable}
% ...fill it with data...
MyDecisionTableImporter
{FILENAME = s:/datasets/samples/iris/iris.import.ros}
% ...and save it.
Saver
{FILENAME = s:/datasets/samples/iris/iris.converted.ros}
% Test script, showing how one can do some hairy
% stuff that the simple scripting language was not
% really intended to support. We don't have variables,
% so we use the file system instead... Ugly as hell,
% but it works.
% Create an empty table...
StructureCreator
{OUTPUT = DecisionTable}
% ...fill it with data...
MyDecisionTableImporter
{FILENAME = s:/datasets/samples/iris/iris.import.ros}
% ...split it randomly in two halves. Append the two halves as children tables.
BinarySplitter
{SEED = 123; FACTOR = 0.5; APPEND = T}
% ...save the original table with the two children...
Saver
{FILENAME = s:/datasets/samples/iris/iris.3tables.ros}
% ...kidnap one of the children...
Kidnapper
{INDEX = 0}
% ...save the one child...
Saver
{FILENAME = s:/datasets/samples/iris/iris.half0.ros}
% ...load back all three tables...
Loader
{FILENAME = s:/datasets/samples/iris/iris.3tables.ros}
% ...kidnap the other child...
Kidnapper
{INDEX = 1}
% ...and save that child...
Saver
{FILENAME = s:/datasets/samples/iris/iris.half1.ros}
Configuration File
To a modest degree, ROSETTA can be configured through a small configuration file:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This is a Rosetta configuration file.
% Name this file "rosetta.cfg" and
% place it in the same directory as
% the Rosetta executable.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Size of font in grid views.
% DEFAULT = 8
GUI::FontSize = 8
% Append output text files as icons in the project tree?
% DEFAULT = TRUE
GUI::AppendFiles = true
% Show attribute types in decision table grid views?
% DEFAULT = FALSE
GUI::ShowTypes = false
% Show attribute units in decision table grid views?
% DEFAULT = TRUE
GUI::ShowUnits = true
% Verbose progress messages?
% DEFAULT = TRUE
Kernel::VerboseMessages = true
% Size of buffer used for loading long lines of text.
% DEFAULT = 10240
Kernel::IOKit::BufferSize = 50000
HypoClass
HypoClass is a small utility program for statistical hypothesis testing for binary outcome classifiers, available as part of the ROSETTA distribution. Currently, the program HypoClass can perform:
- Hanley-McNeil's test for comparing areas under correlated ROC curves.
- McNemar's test for comparing accuracy.
Actual Output [Key [...]]
If it is flagged that the column order is swapped, the line format is:
Output Actual [Key [...]]
- The required field Actual is an integer that denotes the actual outcome for the object, and is assumed to be either 0 or 1.
- The required field Output is a float that denotes the classifier's output when applied to the object, and lies in the interval [0, 1]. The value indicates the classifier's degree of certainty that the object has outcome 1.
- The optional field Key is an integer that typically denotes the index of the current object. This field enables the data lines to be sorted so that the data from the two classifiers can be automatically "aligned". If this field is missing, no sorting takes place and it is the user's responsibility that the data lines are correctly ordered.
CurveClass
CurveClass is a small utility program for creating and saving various curves for binary outcome classifiers, available as part of the ROSETTA distribution. Currently, the program creates ROC curves and calibration curves.
Input to CurveClass is an ASCII file in the same format as for the HypoClass program.