HPA home

COMPSEQ : Counts the composition of dimer/trimer/etc words in a sequence (EMBOSS)



your e-mail
( = required, = conditionally required)


input Section


required Section


advanced Section


output Section


input Section


sequence [sequences] (-sequence) : please enter
either :
  1. the name of a file:
  2. or the actual data here:

(sequence format)


[Return to the main part with your favorite browser's Back function]


required Section

Word size to consider (e.g. 2=dimer) (-word)

[Return to the main part with your favorite browser's Back function]


advanced Section


'compseq' file to use for expected word frequencies (-infile) : please enter either :
  1. the name of a file:
  2. or the actual data here:


Frame of word to look at (0=all frames) (-frame)
Ignore the amino acids B and Z and just count them as 'Other' (-ignorebz)
Count words in the forward and reverse sense (-reverse)

[Return to the main part with your favorite browser's Back function]


output Section

outfile (-outfile)
Display the words that have a frequency of zero (-zerocount)

[Return to the main part with your favorite browser's Back function]


your e-mail

Some explanations about the options


input Section
enter either the name of a file or the actual data
if you are using Netscape 2.x or later, you can select a file by typing its name, or better, by selecting it with the Netscape file browser (Browse button)
OR you can type your data in the next area, or cut and paste it from another application.
(but not both)

advanced Section
'compseq' file to use for expected word frequencies (-infile)
This is a file previously produced by 'compseq' that can be used to set the expected frequencies of words in this analysis.
The word size in the current run must be the same as the one in this results file. Obviously, you should use a file produced from protein sequences if you are counting protein sequence word frequencies, and you must use one made from nucleotide frequencies if you and analysing a nucleotide sequence.
Frame of word to look at (0=all frames) (-frame)
The normal behaviour of 'compseq' is to count the frequencies of all words that occur by moving a window of length 'word' up by one each time.
This option allows you to move the window up by the length of the word each time, skipping over the intervening words.
You can count only those words that occur in a single frame of the word by setting this value to a number other than zero.
If you set it to 1 it will only count the words in frame 1, 2 will only count the words in frame 2 and so on.
Ignore the amino acids B and Z and just count them as 'Other' (-ignorebz)
The amino acid code B represents Asparagine or Aspartic acid and the code Z represents Glutamine or Glutamic acid.
These are not commonly used codes and you may wish not to count words containing them, just noting them in the count of 'Other' words.
Count words in the forward and reverse sense (-reverse)
Set this to be true if you also wish to also count words in the reverse complement of a nucleic sequence.

required Section
Word size to consider (e.g. 2=dimer) (-word)
This is the size of word (n-mer) to count.
Thus if you want to count codon frequencies, you should enter 3 here.

output Section
outfile (-outfile)
This is the results file.
Display the words that have a frequency of zero (-zerocount)
You can make the output results file much smaller if you do not display the words with a zero count.
Sequence format
The sequence will be automatically converted in the format needed for the program
providing you enter a sequence either:
in plain (raw) sequence format or in one of the following known formats:
IG,GenBank,NBRF,EMBL,GCG,DNAStrider,Fitch,fasta,Phylip,PIR,MSF,ASN,PAUP,CLUSTALW
You may enter in the text area a database entry code, or an accession number, in this form:
database:entry_name
or:
database:accession.

Pise form generator version: 5.a (16 Dec 2002 11:53)