Analyze EST data
Introduction
Biozon uses multiple elements to map EST sequences to their
corresponding protein products. We use UniGene clusters, substring
analysis, information about protein coding regions in existing DNA
sequences, and protein database searches to detect protein products
related to a query EST sequence. Gene Ontology terms, SwissProt
keywords, and protein similarity data are used to detect ESTs that
are associated with specific functional descriptors.
Mapping
We say EST s directly maps to protein p if:
- EST s encodes p
- s is a substring of DNA s' near an encoding region of
s' which encodes for p
- s is in a UniGene cluster to which NCBI assigns p
- s is in a UniGene cluster with s' and s'
encodes p
- s is in a UniGene cluster with s' and s'
is a substring of DNA s'' near an encoding region of s''
which encodes for p
In mapping modes 2 and 5,
s is a substring of s' near an encoding region of s'
which encodes for p
currently means s appears as a substring of s' and is no more
than 50 base pairs away from overlapping the s' encoding region for
p.
Mapping modes 4 and 5 serve to complete the information provided by
NCBI for UniGene clusters (in some cases a member of a UniGene cluster
directly encodes for a protein, but is not documented as such by the
NCBI team).
Similarity data We say an EST s maps to protein p
if s directly maps to p or if s directly maps to
p' and p' is similar to p (such relations are
marked clearly in the output). We use the Biozon similarity data with
0.1 as an evalue threshold.
Input file
To analyze your EST sequences, upload a list of GenBank or RefSeq
accession numbers (ACs or GIs), one per line (see example file). If you you have a short list (up to 10
ESTs) you can also paste it to the text box.
The ESTs will be analyzed in search for protein products. The first
page will display a summary table, in which each entry corresponds to
one EST on the list. The information displayed will be a high-level
summary of all proteins that can be linked to that EST (such as a set
of definitions and descriptors, etc). For the detailed list of
proteins and how they are linked to the EST, click on 'View more'. For
more information on the output format see
here.
Target proteins
We also provide an option to check whether the EST can be mapped to a
list of target proteins. The target proteins are characterized by a set
of GO terms and keywords that are provided by the user. This function
is not fully supported yet. The initial form currently allows you to
choose neuro-related proteins. If an EST can be mapped to a target
protein, you will see 'yes' in the corresponding column (Is Target?)
of the result table.
|