More bits and pieces¶

Adapters

Sugar provides adapters to convert sequence objects to the corresponding sequence objects in the Biopython and Biotite libraries, and vice versa.

>>> from sugar import BioBasket, read
>>> seqs = read()
>>> print(seqs)
2 seqs in basket
AB047639  9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA...  GC:58.26%
AB677533  9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG...  GC:57.46%
  customize output with BioBasket.tostr() method
>>> bios = seqs.tobiopython()
>>> print(bios[0])
ID: AB047639
Name: <unknown name>
Description: <unknown description>
Number of features: 2
Seq('ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAAC...TGT')
>>> print(BioBasket.frombiopython(bios))
2 seqs in basket
AB047639  9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA...  GC:58.26%
AB677533  9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG...  GC:57.46%
  customize output with BioBasket.tostr() method

>>> tites = seqs.tobiotite()
>>> print(repr(tites[0])[:80])
NucleotideSequence("ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTGT
>>> print(BioBasket.frombiotite(tites))  # Biotite does not use ids
2 seqs in basket
9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTG...  GC:58.26%
9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACT...  GC:57.46%
  customize output with BioBasket.tostr() method

Indexing of FASTA files

sugar provides an indexing tool for quickly retrieving sequences or subsequences from large FASTA files. It is used via the FastaIndex class or the sugar index command.

The following example uses the sugar index create and add commands to create the index and query the index in Python. Both tasks can be done either from the command line or from Python code.

sugar index create index.db
sugar index add *.fasta

>>> from sugar import FastaIndex
>>> index = FastaIndex('index.db')
>>> print(index)  # display information about index
>>> seq = index.get_seq('NC_081844.1')  # Alternatively, use the .get_basket() method

The Fasta index uses either a binary search file or a database via Python’s dbm module. Depending on the use case, one or the other option may be preferred, usually you want the binary search file.

Downloading sequences from NCBI

The Entrez class can be used to fetch sequences from the NCBI online database:

>>> from sugar.web import Entrez
>>> client = Entrez()
>>> seqs = client.get_basket(['AF522874', 'NC_077015.1'])
>>> print(seqs)
2 seqs in basket
AF522874    19k  CGGACACACAAAAAGAAAAAAGGTTTTTTAAGACTTTTTGTGTGCGAG...  GC:40.63%
NC_077015   12k  TGCATAACCCTGATTGTAATTGGCTGGGTTATGCATGTGAGAACGCAA...  GC:43.03%
customize output with BioBasket.tostr() method

Use the ENTREZ_PATH environment variable or the path option of Entrez or its methods to cache sequence files on the disk. An Entrez API key can be used via the ENTREZ_API_KEY environment variable or the api_key option of Entrez.

Miscellaneous

For command line junkies: The command line interface of sugar can be used for common tasks. Call sugar -h for an overview of available commands.
The sugar.data package: sugar also provides access to codon translation tables and substitution matrices (i.e. BLOSUM62) within the sugar.data package.
Find open reading frames: The find_orfs() method can be used to find open reading frames, see the advanced example in the Sequences Tutorial.
Attributes behave like dictionaries: Metadata and attributes in sugar are instances of Attr, so attributes can be used as dictionary keys, e.g. seq.meta.id and seq.meta['id'] have the same effect.
Shortcuts are convenient: Using the shortcuts seq.fts, seq.id, ft.seqid, etc. has the advantage that additional checks are performed, e.g. assigning seq.fts = my_fts automatically converts my_fts to a FeatureList, accessing seq.id returns an emtpy string if no id attribute is present in the metadata. ft.loc is a shortcut for ft.locs[0].
Overloading of operators: Adding two sequences seq1 + seq2, concatenates the data. Adding two sequence lists seqs1 + seqs2, concatenates the two lists. To concatenate sequences within a list, use the merge() method. Adding two feature lists fts1 + fts2 also works as expected.