Description |
The goal of the web-tool "Gene Identification" is the identification of all
transcription factor binding sites (TFBS) of preselected transcription factors
(TFs) in all Arabidopsis thaliana genes. To use this tool the user has
to select a specific TF (Factor-Name) from a list of all annotated TFs.
To facilitate selection one can first select the TF family. This restricts
the number of selectable factors to these family members. The default upstream
and downstream region of all genes to be searched is -500 and +50 bp relative
to either the transcription start site or the translation start site, depending
on the annotation. It is possible to change these parameters. A maximum
window of 6000 bp, 2000 bp upstream and 4000 bp downstream, can be selected around
either start site. For TFs with binding sites determined with positional weight
matrices, the minimal threshold can be increased to detect only genes with highly
conserved TFBS.
The results can be displayed in two different sort modes. "Gene", which is the
default mode, will list the results according to the genome identifier (AGI);
"Distance" will sort the results according to the distance of the TFBS to the
start site of the gene. The results are displayed in tables appearing below the
user interface. These tables identify the gene, the positions of TFBS and, if
applicable, the individual score of each TFBS. Also, the orientation of the TFBS
relative to the start site of the gene is shown. Furthermore, links to the gene
and the genomic positions of the sites are implemented in the result tables. For
some TFs the number of sites to be searched had to be restricted. This applies
to thirteen TFs with putative binding sites of more than 200,000. In these
cases the score used for screenings is displayed in a "table of restriction scores"
which can be accessed through a link on the user interface. As the TFBS for TBP (TATA box) and CBF (CAAT box) are also positionally
defined, the score restrictions are not applicable to TFBS from TBP and CBF. Furthermore, a score
restriction cannot be applied to sites determined by pattern search or to combinatorial elements.
For further data processing of results, binding
sites detected around annotated genes can be downloaded as a file containing all sites
detected for the selected TF between 2000 bp upstream and 4000 bp downstream of each gene. On the result page,
genes potentially regulated by small RNA and miRNA are identified in italics and bold, respectively.
By selecting "exclude genes putatively regulated by smallRNA" or "exclude genes putatively regulated by miRNA" these genes
are excluded from the analysis.
|