sugar._io.main module¶
Main IO functions, read(), iter_(), write(), read_fts(), write_fts()
- sugar._io.main.detect(fname, what='seqs', *, encoding=None, **kw)[source]¶
Try to detect file format from contents
- Parameters:
what –
'seqs'or'fts'
- sugar._io.main.detect_ext(fname, what='seqs')[source]¶
Try to detect file format for writing from extension
- Parameters:
what –
'seqs'or'fts'
- sugar._io.main.iter_(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶
Iterate over a file and yield
BioSeqobjects of each sequenceSee
read()function.Example:
>>> from sugar import iter_ >>> for seq in iter_(): # use the example file ... print(f'GC content of seq {seq.id} is {100*seq.gc:.0f}%.') GC content of seq AB047639 is 58%. GC content of seq AB677533 is 57%.
Note
Calling
iter_()without thefnameargument returns an example sequences iterator.
- sugar._io.main.read(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶
Read a file or file-like object with sequences into
BioBasket- Parameters:
fname – filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g.
BytesIO,StringIO).fmt – format of the file (default: auto-detect from content)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)
All other kwargs are passed to the underlying reader routine.
The following formats are supported, for documentation of supported kwargs follow the provided links.
format
module
read
description
fasta
FASTA IO
genbank
GenBank reader
embl
EMBL flat file reader for ENA and UniProt
clustal
Clustal IO
stockholm
Stockholm IO
gff
Generic feature format (GFF) and Gene transfer format (GTF) IO
sjson
SJson IO, custom lossless sugar format
Example:
>>> from sugar import read >>> seqs = read('crazy_virus.fasta', 'fasta') # read a local file, the fmt is optionally >>> seqs = read() # load example file >>> print(seqs) 2 seqs in basket AB047639 9678 ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA... GC:58.26% AB677533 9471 GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG... GC:57.46% customize output with BioBasket.tostr() method
>>> url = 'https://raw.githubusercontent.com/rnajena/sugar/master/sugar/tests/data/io_test.zip' >>> seqs = read(url) # load an archive from the web >>> print(seqs) 5 seqs in basket MCHU 150 MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEA... AAD44166.1 284 LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQM... BTBSCRYR 620 TGCACCAAACATGTCTAAAGCTGGAACCAAAATTACTTTCTTTGAAG... GC:52.58% AB047639 9678 ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGT... GC:58.26% AB677533 9471 GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTG... GC:57.46% customize output with BioBasket.tostr() method
Note
Calling
read()without thefnameargument returns an example sequences object.
- sugar._io.main.read_fts(fname, fmt=None, *, mode='r', encoding=None, **kw)[source]¶
Read a file or file-like object with features into
FeatureList- Parameters:
fname – Filename, can also be a glob expression, a web resource, an archive, gzipped file, or a file-like object (e.g.
BytesIO,StringIO)fmt – format of the file (default: auto-detect from content)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request reading an archive, type may be specified (default: auto-detect from file extension)
All other kwargs are passed to the underlying reader routine.
The following formats are supported, for documentation of supported kwargs follow the provided links.
format
module
read
description
gff
Generic feature format (GFF) and Gene transfer format (GTF) IO
gtf
Generic feature format (GFF) and Gene transfer format (GTF) IO
genbank
GenBank reader
embl
EMBL flat file reader for ENA and UniProt
infernal
Infernal reader for output generated with tblout fmt 1, 2, 3
mmseqs
MMseqs2 reader for output generated with option fmtmode 4 (preferred) or 0
meme_txt
Read support for file formats associated with the MEME Suite
blast
BLAST reader for output generated with outfmt 7 (preferred), 6 or 10
tsv
TSV, CSV and XSV file formats IO
csv
TSV, CSV and XSV file formats IO
sjson
SJson IO, custom lossless sugar format
Note
Calling
read_fts()without thefnameargument returns an example features object.
- sugar._io.main.write(seqs, fname, fmt=None, *, mode='w', encoding=None, **kw)[source]¶
Write sequences to file, use it via
BioBasket.write()orBioSeq.write()- Parameters:
seqs – BioBasket object
fname – filename or file-like object
fmt – format of the file (default: auto-detect from file extension)
mode – mode for opening the file, change this only if you know what you do, you may use
mode='a'for appending to an existing file, but this will only work with compatible formats (i.e. FASTA)encoding – encoding of the file
archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)
All other kwargs are passed to the underlying writer routine.
The following formats are supported, for documentation of supported kwargs follow the provided links.
- sugar._io.main.write_fts(fts, fname=None, fmt=None, *, mode='w', **kw)[source]¶
Write features to file, use it via
FeatureList.write()orFeature.write()- Parameters:
fts – FeatureList object
fname – filename or file-like object
fmt – format of the file (default: auto-detect from file extension)
mode – mode for opening the file, change this only if you know what you do
encoding – encoding of the file
archive – Explicitly request writing an archive, type may be specified (default: auto-detect from file extension)
All other kwargs are passed to the underlying writer routine.
The following formats are supported, for documentation of supported kwargs follow the provided links.