How to create a new IO plugin¶
Create a sequence plugin in the sugar package
Let’s assume you want to create an IO plugin for the sequence format fancy.
First, take a look at other plugins in the sugar/_io folder.
Fork the repository to your account and create a new branch
Create a new module or package inside the
sugar/_iofolder, e.g.fancymodule.py.Use the following template to provide read and/or write functionality for your format:
""" My Fancy Seq Plugin This is an example sequence file format. The layout is as follows: #MyFancySeqFormat seq1 AAATTGGGCCC seq2 ATGGCT """ # To add read support you must define either an iter_fancy or read_fancy function, or both. # To add write support you must define either an append_fancy or write_fancy function, or both. from sugar import BioBasket, BioSeq from sugar._io.util import _add_fmt_doc # Use the following flag to indicate, that your file format is binary rather than text-based, # the passed file handlers will be opened in binary mode. #binary_fmt = True # optional, filename extensions for automatic detection of file format # when writing filename_extensions = ['fancy'] def is_fancy(f, **kw): """ Function is optional, used for auto-detection of format when reading It should return True if the format is detected, otherwise it may raise any exception or return False. """ content = f.read(50) return content.strip().lower().startswith('#myfancyseqformat') # The function decorators are used to automatically add a warning # to the docstring, that this function should be called via the main # iter_ or read functions. @_add_fmt_doc('read') def iter_fancy(f, optional_argument=None): """ The iter_fancy function expects a file handler and has to yield BioSeq objects. You can define optional arguments. """ for line in f: if line.strip() != '' and not line.startswith('#'): seqid, data = line.split() yield BioSeq(data, id=seqid) @_add_fmt_doc('read') def read_fancy(f, **kw): """ The read_fancy function expects a file handler and has to return a BioBasket object """ # We are lazy here and reuse iter_fancy return BioBasket(list(iter_fancy(f, **kw))) @_add_fmt_doc('write') def append_fancy(seq, f, **kw): """ Write a single seq to file handler """ f.write(f'{seq.id} {seq.data}\n') @_add_fmt_doc('write') def write_fancy(seqs, f, **kw): """ Write a BioBasket object to file handler """ f.write('#MyFancySeqFormat 3.14159\n') for seq in seqs: # be lazy again append_fancy(seq, f, **kw)
Add your plugin to the
FMTSlist insugar/_io/util.pyRegister the plugin in the
pyproject.tomlfile:[project.entry-points."sugar.io"] fancy = "sugar._io.fancymodule"
Write some tests in a new file
sugar/tests/test_io_fancy.py.Re-Install your branch of sugar and check that everything is working:
from sugar import read seqs = read('example.fancy') print(seqs)
Run your tests.
Create a pull request to get your plugin into the main repository.
Create a sequence plugin that can be used with sugar, but is in an external package
Create your own package by following only step 3 above. Register the plugin in the pyproject.toml of your own project:
[project.entry-points."sugar.io"]
fancy = "myfancypackage.fancymodule"
When your package is installed you can still read seq files using the commands in point 7 above.
Create a features plugin in the sugar package or in an external package
This is analogous to the sequence plugin. It can even be located in the same file as the sequence plugin.
Instead, use the following variables and function definitions:
from sugar.core.fts import FeatureList, Location, Feature
from sugar._io.util import _add_fmt_doc
#binary_fmt_fts
filename_extensions_fts
def is_fts_fancy(f, **kw):
...
@_add_fmt_doc('read_fts')
def read_fts_fancy(f, **kw):
...
return FeatureList(fts)
@_add_fmt_doc('write_fts')
def write_fts_fancy(fts, f, **kw):
f.write(...)
Format test files
Test files for IO plugins can be placed in the src/sugar/tests/data folder.
These files can be read using the !data magic.
For example, read('!data/example.fasta') will read the file example.fasta in the above folder.