sugar._io.main module¶

Main IO functions, read(), iter_(), write(), read_fts(), write_fts()

sugar._io.main.detect(fname, what='seqs', *, encoding=None, **kw)[source]¶

Try to detect file format from contents

Parameters:: what – 'seqs' or 'fts'

sugar._io.main.detect_ext(fname, what='seqs')[source]¶

Try to detect file format for writing from extension

Parameters:: what – 'seqs' or 'fts'

sugar._io.main.iter_(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶

Iterate over a file and yield BioSeq objects of each sequence

See read() function.

Example:

>>> from sugar import iter_
>>> for seq in iter_():  # use the example file
...     print(f'GC content of seq {seq.id} is {100*seq.gc:.0f}%.')
GC content of seq AB047639 is 58%.
GC content of seq AB677533 is 57%.

Note

Calling iter_() without the fname argument returns an example sequences iterator.

sugar._io.main.read(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶

Read a file or file-like object with sequences into BioBasket

Parameters:

fname – filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g. BytesIO, StringIO).
fmt – format of the file (default: auto-detect from content)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying reader routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format	module	read	description
fasta	`sugar._io.fasta`	`✅`	FASTA IO
genbank	`sugar._io.genbank`	`✅`	GenBank reader
embl	`sugar._io.embl`	`✅`	EMBL flat file reader for ENA and UniProt
clustal	`sugar._io.clustal`	`✅`	Clustal IO
stockholm	`sugar._io.stockholm`	`✅`	Stockholm IO
gff	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
sjson	`sugar._io.sjson`	`✅`	SJson IO, custom lossless sugar format

Example:

>>> from sugar import read
>>> seqs = read('crazy_virus.fasta', 'fasta')  # read a local file, the fmt is optionally
>>> seqs = read()  # load example file
>>> print(seqs)
2 seqs in basket
AB047639  9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA...  GC:58.26%
AB677533  9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG...  GC:57.46%
  customize output with BioBasket.tostr() method

>>> url = 'https://raw.githubusercontent.com/rnajena/sugar/master/sugar/tests/data/io_test.zip'
>>> seqs = read(url)  # load an archive from the web
>>> print(seqs)
5 seqs in basket
MCHU         150  MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEA...
AAD44166.1   284  LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQM...
BTBSCRYR     620  TGCACCAAACATGTCTAAAGCTGGAACCAAAATTACTTTCTTTGAAG...  GC:52.58%
AB047639    9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGT...  GC:58.26%
AB677533    9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTG...  GC:57.46%
  customize output with BioBasket.tostr() method

Note

Calling read() without the fname argument returns an example sequences object.

sugar._io.main.read_fts(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶

Read a file or file-like object with features into FeatureList

Parameters:

fname – Filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g. BytesIO, StringIO)
fmt – format of the file (default: auto-detect from content)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying reader routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format	module	read	description
gff	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
gtf	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
genbank	`sugar._io.genbank`	`✅`	GenBank reader
embl	`sugar._io.embl`	`✅`	EMBL flat file reader for ENA and UniProt
infernal	`sugar._io.tab.infernal`	`✅`	Infernal reader for output generated with tblout fmt 1, 2, 3
mmseqs	`sugar._io.tab.mmseqs`	`✅`	MMseqs2 reader for output generated with option fmtmode 4 (preferred) or 0
meme_txt	`sugar._io.meme`	`✅`	Read support for file formats associated with the MEME Suite
blast	`sugar._io.tab.blast`	`✅`	BLAST reader for output generated with outfmt 7 (preferred), 6 or 10
tsv	`sugar._io.tab.xsv`	`✅`	TSV, CSV and XSV file formats IO
csv	`sugar._io.tab.xsv`	`✅`	TSV, CSV and XSV file formats IO
sjson	`sugar._io.sjson`	`✅`	SJson IO, custom lossless sugar format

Note

Calling read_fts() without the fname argument returns an example features object.

sugar._io.main.write(seqs, fname, fmt=None, *, mode='w', encoding=None, **kw)[source]¶

Write sequences to file, use it via BioBasket.write() or BioSeq.write()

Parameters:

seqs – BioBasket object
fname – filename or file-like object
fmt – format of the file (default: auto-detect from file extension)
mode – mode for opening the file, change this only if you know what you do, you may use mode='a' for appending to an existing file, but this will only work with compatible formats (i.e. FASTA)
encoding – encoding of the file
archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying writer routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format	module	write	description
fasta	`sugar._io.fasta`	`✅`	FASTA IO
clustal	`sugar._io.clustal`	`✅`	Clustal IO
stockholm	`sugar._io.stockholm`	`✅`	Stockholm IO
gff	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
sjson	`sugar._io.sjson`	`✅`	SJson IO, custom lossless sugar format

sugar._io.main.write_fts(fts, fname=None, fmt=None, *, mode='w', **kw)[source]¶

Write features to file, use it via FeatureList.write() or Feature.write()

Parameters:

fts – FeatureList object
fname – filename or file-like object
fmt – format of the file (default: auto-detect from file extension)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)

All other kwargs are passed to the underlying writer routine.

The following formats are supported, for documentation of supported kwargs follow the provided links.

format	module	write	description
gff	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
gtf	`sugar._io.gff`	`✅`	Generic feature format (GFF) and Gene transfer format (GTF) IO
tsv	`sugar._io.tab.xsv`	`✅`	TSV, CSV and XSV file formats IO
csv	`sugar._io.tab.xsv`	`✅`	TSV, CSV and XSV file formats IO
sjson	`sugar._io.sjson`	`✅`	SJson IO, custom lossless sugar format