Tutorial

Pipeline

About

This pipeline was built in order to obtain an interactive web-page from a ROSETTA or line-by-line formatted rule file using the Circos software. It is designed as a web-based interface running under an Apache server and using the Perl and Bash programming languages and SQLite. The web-page was written in PHP and HTML.

Usage and parameters

To run the pipeline, the user has to submit a file either in the ROSETTA export format or in a line-by-line format. The pipeline will run with the default settings and several optinal parameters are available. A complete overview of the parameters is shown below.

ParameterTypeRangeDescription
Rule fileFileLocal path to the rule file.
ThresholdInteger0-99If checked, connections below the percentile threshold value are not shown in the circle.
Rule formatNominalThe format of the rule file.
E-mailStringA link to the results will be sent to this e-mail address when the job is finished.
Minimum accuracyInteger0-99If checked, rules with accuracy below the threshold are not considered.
Minimum supportInteger≥0If checked, rules with support below the threshold are not considered.
GroupsFileLocal path to a group file. Used to defined nodes with similar color.
ColorsFileLocal path to a color file. Used to defined colors of the nodes.

A list of all output files is given in the section below. The interactive web pages may be used to investigate the rules. Two interactivities are defined: The user may retrieve the attribute names by hovering the pointer over the nodes in the figure. Furthermore, it is possible to retrieve a list of the rules that contain two conditions by clicking on the edge connecting those.

Colors and groups

By default, the nodes (rule conditions) are grouped by the attribute name. A color for each group is generated automatically. Files that specify the groups (crv_groups.conf) and colors (crv_colors.conf) are generated by Ciruvis and can be downloaded for later use.

Step-by-step

We now provide a general step-by-step description of the pipeline. We only describe the operations that are performed on the server to obtain a functional web page.

Starting from a rule file :

Input format

Two input formats are currently supported. Contact us if you need support for additional formats.

ROSETTA

% Rules/patterns generated by ROSETTA.
% Exported 2012.01.04 18:44:15 by .
%
% Rules
% 68 rules.

petal_width([*, 0.8)) => class(setosa)
Supp. (LHS) = [47 object(s)]
Supp. (RHS) = [47 object(s)]
Acc.  (RHS) = [1]
Cov.  (LHS) = [0.348148]
Cov.  (RHS) = [1]
Stab. (LHS) = [1]
Stab. (RHS) = [1]

sepal_length([*, 5.9)) AND sepal_width([3.5, 3.9)) => class(setosa)
Supp. (LHS) = [15 object(s)]
Supp. (RHS) = [15 object(s)]
Acc.  (RHS) = [1]
Cov.  (LHS) = [0.111111]
Cov.  (RHS) = [0.319149]
Stab. (LHS) = [1]
Stab. (RHS) = [1]
The file header and two rules from a ROSETTA-formatted rule file.

Plain text

Rules may be submitted in a plain text tab-separated format using the following column structure:

Column 1: The left-hand side (LHS) of the rule expressed as a comma-separated list of the rule conditions, e.g. "Attribute1=value1,Attribute2=value2,Attribute3=value3".
Column 2: The right-hand side (RHS) of the rule expressed as the value of the decision attribute.
Column 3: Rule accuracy, defined as P(RHS|LHS).
Column 4: Rule support, defined as P(LHS)*N, where N is the number of objects in the data set.
Any addtional columns are ignored.

petal_width=*-0.8	setosa	0.348148	47
sepal_length=*-5.9,sepal_width=3.5-3.9	setosa	1	15
Two rules from a plain text-formatted rule file.

List of output files

crv_colors.conf

The crv_colors.conf shows the colors for each group of conditions. The colors are specified in the r,g,b format with one color at each line. The groups are numbered starting from 0, which implies that line x is assumed to define the color of group x-1. If there are more groups that colors specified, the remaining groups will be colored in gray.

If crv_colors.conf is not submitted by the user, x+1 colors will be generated automatically, where x is the highest group number in crv_groups.conf.

255,51,51
255,173,51
214,255,51
92,255,51
51,255,133
51,255,255
51,133,255
92,51,255
214,51,255
255,51,173
A crv_colors.conf file defining ten different colors.

crv_groups.conf

The crv_groups.conf contain a list of all conditions and the group to which they belong (starting from number 0). The condition and the group number is separated by a tab. Multiple conditions may belong to the same group. Conditions that are not specified in this file will be colored in gray.

If crv_groups.conf is not submitted by the user, a file will be generated that group all conditions with the same attribute to the same group.

C0_0=0	0
C0_0=1	0
C1_4=0	1
C1_4=1	1
C2_8=0	2
C2_8=1	2
C3_11=0	3
C3_11=1	3
C4_15=0	4
C4_15=1	4
R0_0=0	5
R0_0=1	5
R1_21=0	6
R1_21=1	6
R2_43=0	7
R2_43=1	7
R3_64=0	8
R3_64=1	8
R4_85=0	9
R4_85=1	9
S0_0=0	5
S0_0=1	5
S1_21=0	6
S1_21=1	6
S2_43=0	7
S2_43=1	7
S3_64=0	8
S3_64=1	8
S4_85=0	9
S4_85=1	9
A crv_groups.conf file which maps 30 conditions into ten groups. Note that using this configuration, conditions with the same attribute will always have the same color no matter what value it has. Also note that any pair of Ri and Si always have the same color.