sugar._io.main module

Main IO functions, read(), iter_(), write(), read_fts(), write_fts()

sugar._io.main.detect(fname, what='seqs', *, encoding=None, **kw)[source]

Try to detect file format from contents

Parameters:

what'seqs' or 'fts'

sugar._io.main.detect_ext(fname, what='seqs')[source]

Try to detect file format for writing from extension

Parameters:

what'seqs' or 'fts'

sugar._io.main.iter_(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]

Iterate over a file and yield BioSeq objects of each sequence

See read() function.

Example:

>>> from sugar import iter_
>>> for seq in iter_():  # use the example file
...     print(f'GC content of seq {seq.id} is {100*seq.gc:.0f}%.')
GC content of seq AB047639 is 58%.
GC content of seq AB677533 is 57%.

Note

Calling iter_() without the fname argument returns an example sequences iterator.

sugar._io.main.read(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]

Read a file or file-like object with sequences into BioBasket

Parameters:
  • fname – filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g. BytesIO, StringIO).

  • fmt – format of the file (default: auto-detect from content)

  • mode – mode for opening the file, change this only if you know what you do

  • encoding – encoding of the file

  • archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying reader routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format

module

read

description

fasta

sugar._io.fasta

FASTA IO

genbank

sugar._io.genbank

GenBank reader

embl

sugar._io.embl

EMBL flat file reader for ENA and UniProt

clustal

sugar._io.clustal

Clustal IO

stockholm

sugar._io.stockholm

Stockholm IO

gff

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

sjson

sugar._io.sjson

SJson IO, custom lossless sugar format

Example:

>>> from sugar import read
>>> seqs = read('crazy_virus.fasta', 'fasta')  # read a local file, the fmt is optionally
>>> seqs = read()  # load example file
>>> print(seqs)
2 seqs in basket
AB047639  9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA...  GC:58.26%
AB677533  9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG...  GC:57.46%
  customize output with BioBasket.tostr() method
>>> url = 'https://raw.githubusercontent.com/rnajena/sugar/master/sugar/tests/data/io_test.zip'
>>> seqs = read(url)  # load an archive from the web
>>> print(seqs)
5 seqs in basket
MCHU         150  MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEA...
AAD44166.1   284  LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQM...
BTBSCRYR     620  TGCACCAAACATGTCTAAAGCTGGAACCAAAATTACTTTCTTTGAAG...  GC:52.58%
AB047639    9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGT...  GC:58.26%
AB677533    9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTG...  GC:57.46%
  customize output with BioBasket.tostr() method

Note

Calling read() without the fname argument returns an example sequences object.

sugar._io.main.read_fts(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]

Read a file or file-like object with features into FeatureList

Parameters:
  • fname – Filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g. BytesIO, StringIO)

  • fmt – format of the file (default: auto-detect from content)

  • mode – mode for opening the file, change this only if you know what you do

  • encoding – encoding of the file

  • archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying reader routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format

module

read

description

gff

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

gtf

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

genbank

sugar._io.genbank

GenBank reader

embl

sugar._io.embl

EMBL flat file reader for ENA and UniProt

infernal

sugar._io.tab.infernal

Infernal reader for output generated with tblout fmt 1, 2, 3

mmseqs

sugar._io.tab.mmseqs

MMseqs2 reader for output generated with option fmtmode 4 (preferred) or 0

meme_txt

sugar._io.meme

Read support for file formats associated with the MEME Suite

blast

sugar._io.tab.blast

BLAST reader for output generated with outfmt 7 (preferred), 6 or 10

tsv

sugar._io.tab.xsv

TSV, CSV and XSV file formats IO

csv

sugar._io.tab.xsv

TSV, CSV and XSV file formats IO

sjson

sugar._io.sjson

SJson IO, custom lossless sugar format

Note

Calling read_fts() without the fname argument returns an example features object.

sugar._io.main.write(seqs, fname, fmt=None, *, mode='w', encoding=None, **kw)[source]

Write sequences to file, use it via BioBasket.write() or BioSeq.write()

Parameters:
  • seqs – BioBasket object

  • fname – filename or file-like object

  • fmt – format of the file (default: auto-detect from file extension)

  • mode – mode for opening the file, change this only if you know what you do, you may use mode='a' for appending to an existing file, but this will only work with compatible formats (i.e. FASTA)

  • encoding – encoding of the file

  • archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying writer routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format

module

write

description

fasta

sugar._io.fasta

FASTA IO

clustal

sugar._io.clustal

Clustal IO

stockholm

sugar._io.stockholm

Stockholm IO

gff

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

sjson

sugar._io.sjson

SJson IO, custom lossless sugar format

sugar._io.main.write_fts(fts, fname=None, fmt=None, *, mode='w', **kw)[source]

Write features to file, use it via FeatureList.write() or Feature.write()

Parameters:
  • fts – FeatureList object

  • fname – filename or file-like object

  • fmt – format of the file (default: auto-detect from file extension)

  • mode – mode for opening the file, change this only if you know what you do

  • encoding – encoding of the file

  • archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying writer routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format

module

write

description

gff

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

gtf

sugar._io.gff

Generic feature format (GFF) and Gene transfer format (GTF) IO

tsv

sugar._io.tab.xsv

TSV, CSV and XSV file formats IO

csv

sugar._io.tab.xsv

TSV, CSV and XSV file formats IO

sjson

sugar._io.sjson

SJson IO, custom lossless sugar format