sanger logo


RATT: Transformation of input / output formats
   biomalpar

RATT is working so far exclusively on the EMBL format. But in some cases the annotation is held in a different format. The most common formats are: EMBL, GENBANK and GFF. Here we describe briefly how to transform an annotation file from and to the EMBL format.

Easy: Smaller genomes

If you have just one genome, the easiest way is to open the file in Artemis. It will automatically recognise the input format. In case that your sequence is in a seperate file, e.g. gff, load first the sequence file, and than the annotation file.

To save the file in another format just save it in the requested format, see figure. Available are EMBL, GENBANK, SEQIN and GFF.

More Chromosomes or more genomes: Bioperl

If more than one chromosome has to be transformed we normally use scripts. Here are examples of how to do the transfer with Bioperl. Obviously, Bioperl must be installed, at least the used modules. Although this is a quicker way to transform the files, it requires a bit more setting up.

Genbank 2 EMBL

#!/usr/local/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 2) {    die "USAGE: gb2embl.pl    \n"; }

my $seqio = Bio::SeqIO->new('-format' => 'genbank', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'embl', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}

EMBL 2 Genbank

#!/usr/local/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 2) {    die "USAGE: embl2gb.pl    \n"; }

my $seqio = Bio::SeqIO->new('-format' => 'embl', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'genbank', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}

GFF 2 EMBL

From the AUSGUSTUS gene prediction pipeline: gff2gbSmallDNA.pl.

EMBL 2 GFF

 
#!/usr/local/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 1) {    die "USAGE: embl2gff.pl   > outputfile.\n"; }

my $in = Bio::SeqIO->new(-file=>$ARGV[0],-format=>'EMBL');
while (my $seq = $in->next_seq) {
  for my $feat ($seq->top_SeqFeatures) {
	print $feat->gff_string,"\n";
  }
}