More bits and pieces¶
Adapters
Sugar provides adapters to convert sequence objects to the corresponding sequence objects in the Biopython and Biotite libraries, and vice versa.
>>> from sugar import BioBasket, read
>>> seqs = read()
>>> print(seqs)
2 seqs in basket
AB047639 9678 ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA... GC:58.26%
AB677533 9471 GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG... GC:57.46%
customize output with BioBasket.tostr() method
>>> bios = seqs.tobiopython()
>>> print(bios[0])
ID: AB047639
Name: <unknown name>
Description: <unknown description>
Number of features: 2
Seq('ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAAC...TGT')
>>> print(BioBasket.frombiopython(bios))
2 seqs in basket
AB047639 9678 ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA... GC:58.26%
AB677533 9471 GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG... GC:57.46%
customize output with BioBasket.tostr() method
>>> tites = seqs.tobiotite()
>>> print(repr(tites[0])[:80])
NucleotideSequence("ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTGT
>>> print(BioBasket.frombiotite(tites)) # Biotite does not use ids
2 seqs in basket
9678 ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTG... GC:58.26%
9471 GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACT... GC:57.46%
customize output with BioBasket.tostr() method
Indexing of FASTA files
sugar provides an indexing tool for quickly retrieving
sequences or subsequences from large FASTA files.
It is used via the FastaIndex class or the sugar index command.
The following example uses the sugar index create and add commands
to create the index and query the index in Python.
Both tasks can be done
either from the command line or from Python code.
sugar index create index.db
sugar index add *.fasta
>>> from sugar import FastaIndex
>>> index = FastaIndex('index.db')
>>> print(index) # display information about index
>>> seq = index.get_seq('NC_081844.1') # Alternatively, use the .get_basket() method
The Fasta index uses either a binary search file or
a database via Python’s dbm module.
Depending on the use case, one or the other option may be preferred,
usually you want the binary search file.
Downloading sequences from NCBI
The Entrez class can be used to fetch sequences
from the NCBI online database:
>>> from sugar.web import Entrez
>>> client = Entrez()
>>> seqs = client.get_basket(['AF522874', 'NC_077015.1'])
>>> print(seqs)
2 seqs in basket
AF522874 19k CGGACACACAAAAAGAAAAAAGGTTTTTTAAGACTTTTTGTGTGCGAG... GC:40.63%
NC_077015 12k TGCATAACCCTGATTGTAATTGGCTGGGTTATGCATGTGAGAACGCAA... GC:43.03%
customize output with BioBasket.tostr() method
Use the ENTREZ_PATH environment variable or
the path option of Entrez or its methods
to cache sequence files on the disk.
An Entrez API key can be used via the ENTREZ_API_KEY environment variable
or the api_key option of Entrez.
Miscellaneous
- For command line junkies
The command line interface of sugar can be used for common tasks. Call
sugar -hfor an overview of available commands.- The sugar.data package
sugar also provides access to codon translation tables and substitution matrices (i.e. BLOSUM62) within the
sugar.datapackage.- Find open reading frames
The
find_orfs()method can be used to find open reading frames, see the advanced example in the Sequences Tutorial.- Attributes behave like dictionaries
Metadata and attributes in sugar are instances of
Attr, so attributes can be used as dictionary keys, e.g.seq.meta.idandseq.meta['id']have the same effect.- Shortcuts are convenient
Using the shortcuts
seq.fts,seq.id,ft.seqid, etc. has the advantage that additional checks are performed, e.g. assigningseq.fts = my_ftsautomatically convertsmy_ftsto aFeatureList, accessingseq.idreturns an emtpy string if noidattribute is present in the metadata.ft.locis a shortcut forft.locs[0].- Overloading of operators
Adding two sequences
seq1 + seq2, concatenates the data. Adding two sequence listsseqs1 + seqs2, concatenates the two lists. To concatenate sequences within a list, use themerge()method. Adding two feature listsfts1 + fts2also works as expected.