Effectidor produces both tabular and graphical outputs.
In the tabular output there are:
On screen tables
Downloadable files
The graphical output shows analysis of the features. The 10 most contributing features will be compared between the two classes - effectors and non-effectors, in violin plots.
You can see an example for the output in the running example here.
The main results, the predictions file, contains the ORFs sorted by their likelihood to encode T3Es.
To identify candidates to be novel T3Es, search for ORFs with a high score at the top rows of the table.
If you used an input Effectors file, look for samples that are not already known to be effectors. These samples will have a question mark (?) in the column "is_effector".
If you did not used an input Effectors file, the samples marked with "yes" in the "is_effector" column are homologous to known T3Es, and the highly scored samples marked with a question mark (?) are potentially novel T3Es.
For your convenience, the highly scored samples are highlighted with dark green that becomes lighter as the score decreases.
For your convenience, the amino acid sequence will be available for each ORF, as well as the annotation (if exists in the input file).
In general, an ORF with a high score and an ambiguous annotation is a good candidate to be a novel T3E.
In the learning process Effectidor randomly leaves 20% of the labeled data (effectors and non-effectors) aside, as a test set. Each learning algorithm is trained on the remaining 80% training data, using cross validation, and then evaluated on the untouched 20% test set.
The measurement used to evaluate Effectidor's performance is the Area Under the Precision Recall Curve (AUPRC). The closer it is to 1, the more accurate the classifier is. You can see the resulting AUPRC achieved on the test set in the downloadable predictions file.
Effectidor has one obligatory input - an ORFs file.
This is a FASTA file including all the genome ORFs. See instructions for downloading this file here.
Some of the ORFs in this file (effectors and non-effectors) will be used to train the machine-learning algorithms, and based on the trained classifier, the main output - prediction for each ORF - will be performed.
In addition, it is recommended to supply a known Effectors file, as appearing in the ORFs file. Alternatively, a homology search against an internal effectors dataset will be performed to constitute the "known effectors" ORFs for the learning process.
In the advanced options you can supply data that will result in additional features to feed the machine-learning and improve the results. These data include:
The running time depends on many factors, among them the input you supply and the load in our servers.
It can take a few minutes and up to several hours. We will email you a link to the results upon submission, if you choose to supply your email address.
The results will be saved in our servers for 3 months.
After 3 months they will be permanently deleted from our servers.
If you submit your email we will use it to send you a link to your results.
In addition, in case of a bug we could contact you upon fixing it.
As of today, Effectidor only works for one genome at a time.
You can submit a different job for each genome, but you cannot run a common analysis for more than one genome.
This may change in the future, stay tuned.