HPA home

EMMA : Multiple alignment program - interface to ClustalW program (EMBOSS)



your e-mail
( = required, = conditionally required)


input Section


advanced Section

Insist that the sequence type is changed to protein (-insist)
Cut-off to delay the alignment of the most divergent sequences (-maxdiv)

output Section


input Section


inseqs -- gapany [sequences] (-inseqs) : please enter
either :
  1. the name of a file:
  2. or the actual data here:

(sequence format)


[Return to the main part with your favorite browser's Back function]


advanced Section


dendsection Section


slowsection Section


protsection Section


fastsection Section


matrixsection Section


gapsection Section


[Return to the main part with your favorite browser's Back function]


dendsection Section

Do you want to produce only the dendrogram file? (-onlydend)
Do you want to use an old dendogram file? (-dend)
What is the name of the old dendrogram file (-dendfile)

[Return to the main part with your favorite browser's Back function]


slowsection Section

Please select one -- Do you want to carry out slow or fast pairwise alignment (-slowfast) ? slow fast
Input value for gap open penalty (-pwgapc)
Input value for gap extension penalty (-pwgapv)

[Return to the main part with your favorite browser's Back function]


protsection Section

Do not change this value (-prot)

Select matrix -- Protein pairwise alignment matrix options (-pwmatrix)
Select matrix -- DNA pairwise alignment matrix options (-pwdnamatrix) ? iub clustalw own
Input the filename of your pairwise matrix (-pairwisedata)

[Return to the main part with your favorite browser's Back function]


fastsection Section

Fast pairwise alignment: similarity scores: K-Tuple size (-ktup)
Fast pairwise alignment: similarity scores: gap penalty (-gapw)
Fast pairwise alignment: similarity scores: number of diagonals to be considered (-topdiags)
Fast pairwise alignment: similarity scores: diagonal window size (-window)
Fast pairwise alignment: similarity scores: suppresses percentage score (-nopercent)

[Return to the main part with your favorite browser's Back function]


matrixsection Section


Select matrix -- Protein multiple alignment matrix options (-matrix)
Select matrix -- Nucleotide multiple alignment matrix options (-dnamatrix) ? iub clustalw own
Input the filename of your alignment matrix (-mamatrix)

[Return to the main part with your favorite browser's Back function]


gapsection Section

Enter gap penalty (-gapc)
Enter variable gap penalty (-gapv)
Transitions are unweighted (-unweighted)
Use end gap separation penalty (-endgaps)
Gap separation distance (-gapdist)
No residue specific gaps (-norgap)
List of hydrophilic residues (-hgapres)
No hydrophilic gaps (-nohgap)

[Return to the main part with your favorite browser's Back function]


output Section

The sequence alignment output filename (-outseq)

Output format for: The sequence alignment output filename
The dendogram output filename (-dendoutfile)

[Return to the main part with your favorite browser's Back function]


your e-mail

Some explanations about the options


Main parameters
Cut-off to delay the alignment of the most divergent sequences (-maxdiv)
This switch, delays the alignment of the most distantly related sequences until after the most closely related sequences have been aligned. The setting shows the percent identity level required to delay the addition of a sequence; sequences that are less identical than this level to any other sequences will be aligned later. Allowed values: Integer from 0 to 100

gapsection Section
Enter gap penalty (-gapc)
The penalty for opening a gap in the alignment. Increasing the gap opening penalty will make gaps less frequent. Allowed values: Positive foating point number
Enter variable gap penalty (-gapv)
The penalty for extending a gap by 1 residue. Increasing the gap extension penalty will make gaps shorter. Terminal gaps are not penalised. Allowed values: Positive foating point number
Transitions are unweighted (-unweighted)
The 'Transition weight' gives transitions (A <--> G or C <--> T i.e. purine-purine or pyrimidine-pyrimidine substitutions) a weight between 0 and 1; a weight of zero means that the transitions are scored as mismatches, while a weight of 1 gives the transitions the match score. For distantly related DNA sequences, the weight should be near to zero; for closely related sequences it can be useful to assign a higher score.
Use end gap separation penalty (-endgaps)
'End gap separation' treats end gaps just like internal gaps for the purposes of avoiding gaps that are too close (set by 'gap separation distance'). If you turn this off, end gaps will be ignored for this purpose. This is useful when you wish to align fragments where the end gaps are not biologically meaningful.
Gap separation distance (-gapdist)
'Gap separation distance' tries to decrease the chances of gaps being too close to each other. Gaps that are less than this distance apart are penalised more than other gaps. This does not prevent close gaps; it makes them less frequent, promoting a block-like appearance of the alignment. Allowed values: Positive integer
No residue specific gaps (-norgap)
'Residue specific penalties' are amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence. As an example, positions that are rich in glycine are more likely to have an adjacent gap than positions that are rich in valine.
List of hydrophilic residues (-hgapres)
This is a set of the residues 'considered' to be hydrophilic. It is used when introducing Hydrophilic gap penalties.
No hydrophilic gaps (-nohgap)
'Hydrophilic gap penalties' are used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common. The residues that are 'considered' to be hydrophilic are set by '-hgapres'.

protsection Section
Select matrix -- Protein pairwise alignment matrix options (-pwmatrix)
The scoring table which describes the similarity of each amino acid to each other.
There are three 'in-built' series of weight matrices offered. Each consists of several matrices which work differently at different evolutionary distances. To see the exact details, read the documentation. Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from almost identical sequences to highly divergent ones). For very similar sequences, it is best to use a strict weight matrix which only gives a high score to identities and the most favoured conservative substitutions. For more divergent sequences, it is appropriate to use 'softer' matrices which give a high score to many other frequent substitutions.
1) BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 45 and 30.
2) PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices.
3) GONNET . These matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger data set. They appear to be more sensitive than the Dayhoff series. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices.
We also supply an identity matrix which gives a score of 1.0 to two identical amino acids and a score of zero otherwise. This matrix is not very useful.
Select matrix -- DNA pairwise alignment matrix options (-pwdnamatrix)
The scoring table which describes the scores assigned to matches and mismatches (including IUB ambiguity codes).

matrixsection Section
Select matrix -- Protein multiple alignment matrix options (-matrix)
This gives a menu where you are offered a choice of weight matrices. The default for proteins is the PAM series derived by Gonnet and colleagues. Note, a series is used! The actual matrix that is used depends on how similar the sequences to be aligned at this alignment step are. Different matrices work differently at each evolutionary distance.
There are three 'in-built' series of weight matrices offered. Each consists of several matrices which work differently at different evolutionary distances. To see the exact details, read the documentation. Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from almost identical sequences to highly divergent ones). For very similar sequences, it is best to use a strict weight matrix which only gives a high score to identities and the most favoured conservative substitutions. For more divergent sequences, it is appropriate to use 'softer' matrices which give a high score to many other frequent substitutions.
1) BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 45 and 30.
2) PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices.
3) GONNET . These matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger data set. They appear to be more sensitive than the Dayhoff series. We use the GONNET 40, 80, 120, 160, 250 and 350 matrices.
We also supply an identity matrix which gives a score of 1.0 to two identical amino acids and a score of zero otherwise. This matrix is not very useful. Alternatively, you can read in your own (just one matrix, not a series).
Select matrix -- Nucleotide multiple alignment matrix options (-dnamatrix)
This gives a menu where you are offered amenu where a single matrix (not a series) can be selected.

input Section
enter either the name of a file or the actual data
if you are using Netscape 2.x or later, you can select a file by typing its name, or better, by selecting it with the Netscape file browser (Browse button)
OR you can type your data in the next area, or cut and paste it from another application.
(but not both)


fastsection Section
Fast pairwise alignment: similarity scores: K-Tuple size (-ktup)
This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to increase the default. Allowed values: integer from 0 to 4
Fast pairwise alignment: similarity scores: gap penalty (-gapw)
This is a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values. Allowed values: Positive integer
Fast pairwise alignment: similarity scores: number of diagonals to be considered (-topdiags)
The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This parameter specifies how many. Decrease for speed; increase for sensitivity. Allowed values: Positive integer
Fast pairwise alignment: similarity scores: diagonal window size (-window)
This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity. Allowed values: Positive integer

slowsection Section
Please select one -- Do you want to carry out slow or fast pairwise alignment (-slowfast)
A distance is calculated between every pair of sequences and these are used to construct the dendrogram which guides the final multiple alignment. The scores are calculated from separate pairwise alignments. These can be calculated using 2 methods: dynamic programming (slow but accurate) or by the method of Wilbur and Lipman (extremely fast but approximate).
The slow-accurate method is fine for short sequences but will be VERY SLOW for many (e.g. >100) long (e.g. >1000 residue) sequences.
Input value for gap open penalty (-pwgapc)
The penalty for opening a gap in the pairwise alignments.
Input value for gap extension penalty (-pwgapv)
The penalty for extending a gap by 1 residue in the pairwise alignments.
Sequence format
The sequence will be automatically converted in the format needed for the program
providing you enter a sequence either:
in plain (raw) sequence format or in one of the following known formats:
IG,GenBank,NBRF,EMBL,GCG,DNAStrider,Fitch,fasta,Phylip,PIR,MSF,ASN,PAUP,CLUSTALW
You may enter in the text area a database entry code, or an accession number, in this form:
database:entry_name
or:
database:accession.

Pise form generator version: 5.a (16 Dec 2002 11:53)