sugar.core.seq module¶
Sequence related classes, BioSeq, BioBasket
- class sugar.core.seq.BioSeq(data, id='', meta=None, type=None)[source]¶
Bases:
objectClass holding sequence data and metadata, exposing bioinformatics methods.
Most methods work in-place by default, but return the BioSeq object again. Therefore, method chaining can be used.
- classmethod frombiopython(obj)[source]¶
Create a
BioSeqobject from a biopythonSeqRecordorSeqobject.- Parameters:
obj – The object to convert.
Note
BioPython Features in the
SeqRecord.featuresattribute are automatically converted.
- classmethod frombiotite(obj)[source]¶
Create a
BioSeqobject from a biotite sequence object.- Parameters:
obj – The object to convert.
- add_fts(fts)[source]¶
Add some features to the feature list.
If you want to set all features, use the
BioSeq.ftsattribute.- Parameters:
fts – features to add
- complement()[source]¶
Complementary sequence, i.e. transcription
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- find_orfs(*args, **kw)[source]¶
Find ORFS in the sequence, see
find_orfs()
- matchall(*args, **kw)[source]¶
Search regex and return
BioMatchListwith all matches, seematch()
- plot_ftsviewer(*args, **kw)[source]¶
Plot features of the sequence using DNAFeaturesViewer, see
plot_ftsviewer()Note
Using
BioSeqorBioBasket.plot_ftsviewer()overFeatureList.plot_ftsviewer()has the advantage, that sequence lengths are used automatically.
- reverse()[source]¶
Reverse the sequence
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- sl(**kw)[source]¶
Method that allows you to slice the
BioSeqobject with non-default options.If you want to use the default options, you can slice the BioSeq object directly. For non-default options, slice the sliceable object returned by this method.
- Parameters:
Slicing options:
The slice specifies which part of the sequence is returned, and is defined inside the square brackets
[]The following types are supported.- int,slice
The location is specified by int or slice
Locationspecified by location
Featurespecified by feature
- str
Position of the first feature of the given type, e.g.
'cds'will return the sequence with the first coding sequence.
Example:
>>> from sugar import read >>> seq = read()[0] >>> print(seq[:5]) # use direct slicing for default options ACCTG >>> print(seq[4]) G >>> print(seq['cds'][:3]) ATG >>> print(seq.sl(inplace=True, gap='-')[:5:2]) # non-default options ACG >>> print(seq) # has been modified in-place ACG
- slindex(gap=None)[source]¶
Method that translates an index to account for gaps
Example:
>>> from sugar import BioSeq >>> seq = BioSeq('ATG---GGA') >>> print(seq) ATG---GGA >>> print(seq[1:5]) TG-- >>> print(seq.sl(gap='-')[1:5]) TG---GG >>> print(seq.slindex(gap='-')[1:5]) slice(1, 8, None) >>> print(seq[seq.slindex(gap='-')[1:5]]) TG---GG
- tobiopython()[source]¶
Convert BioSeq to biopython
SeqRecordinstanceAttached
BioSeq.ftsfeatures are automatically converted.
- tobiotite(**kw)[source]¶
Convert BioSeq to biotite
NucleotideSequenceorProteinSequenceinstance- Parameters:
type (str) –
'nt'creates aNucleotideSequenceinstance,'aa'creates aProteinSequenceinstance, by default the class is inferred from the sequence itself.gap (str) – Gap characters that must be removed from the sequence string.
warn (bool) – Whether to warn if gap characters have been removed, default is True.
- toftsviewer(**kw)[source]¶
Convert features of this sequence to DNAFeaturesViewer
GraphicRecord
- tostr(**kw)[source]¶
Return a nice string, see
BioBasket.tostr()
- translate(*args, update_fts=False, **kw)[source]¶
Translate nucleotide sequence to amino acid sequence, see
translate().The original translate method of the str class can be used via
BioBasket.str.translate().Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- data¶
Property holding the data string
- property fts¶
Alias for
BioSeq.meta.ftsThe fts object holds all feature metadata. It is an instance of
FeatureList.
- property gc¶
GC content of the sequence
- property id¶
Alias for
BioSeq.meta.id
- meta¶
Property holding metadata
- property str¶
Namespace holding all available string methods, see
_BioSeqStrfor available methods andstrfor documentation of the methodsExample:
>>> seq = read()[0] >>> seq.str.find('ATG') # Use string method 30
- type¶
type of the sequence, either
'nt'or'aa'
- class sugar.core.seq.BioBasket(data=None, meta=None)[source]¶
Bases:
UserListClass holding a list of
BioSeqobjectsThe BioBasket object can be used like a list. It has useful bioinformatics methods attached to it.
The list itself is stored in the
dataproperty. The BioBasket object may also have a metadata attribute.- classmethod frombiopython(obj)[source]¶
Create a
BioBasketobject from a list of biopythonSeqRecordorSeqobjects.- Parameters:
obj – The object to convert, can also be a
MultipleSeqAlignmentobject.
Note
BioPython Features in the
SeqRecord.featuresattribute are automatically converted.
- classmethod frombiotite(obj)[source]¶
Create a
BioBasketobject from a list of biotite sequence objects.- Parameters:
obj – The object to convert, can also be a biotite
Alignmentobject.
- add_fts(fts)[source]¶
Add some features to the feature list of the corresponding sequences.
If you want to set all features, use the
BioBasket.ftsattribute.- Parameters:
fts – features to add
- complement()[source]¶
Complementary sequences, i.e. transcription
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- countall(rtype='counter', k=1)[source]¶
Count letters in sequences
This method may undergo disrupting changes or it may be removed in a later release.
- Parameters:
rtype –
'counter'ReturnCounterobject'prob'Return dictionary with normalized counts'df'Return pandas DataFrame object with count, prob and tprob (total prob) fields
- countplot(y='word', x='count', hue='id', order=None, plot='show', figsize=None, ax=None, savefigkw={}, **kw)[source]¶
Create a plot of letter counts
This method may undergo disruptive changes, or it may be removed in a later release.
Under the hood this method uses the pandas and seaborn libraries. For a help on most of the arguments, see seaborn.barplot().
- find_orfs(*args, **kw)[source]¶
Find ORFS in sequences, see
find_orfs()
- groupby(keys=('id',), flatten=False)[source]¶
Group sequences
- Parameters:
keys – Tuple of meta keys or functions to use for grouping. Can also be a single string or a callable. By default, the method groups only by id.
- Returns:
Nested dict structure
Example:
>>> from sugar import read >>> seqs = read() >>> grouped = seqs.groupby()
- match(*args, **kw)[source]¶
Search regex and return
BioMatchListof matches, seematch()
- matchall(*args, **kw)[source]¶
Search regex and return
BioMatchListof all matches, seematch()
- plot_alignment(*args, **kw)[source]¶
Plot an alignment, see
plot_alignment()
- plot_ftsviewer(*args, **kw)[source]¶
Plot features of the sequences using DNAFeaturesViewer, see
plot_ftsviewer()Note
Using
BioSeqorBioBasket.plot_ftsviewer()overFeatureList.plot_ftsviewer()has the advantage, that sequence lengths are used automatically.
- rc(**kw)[source]¶
Reverse complement, alias for
BioBasket.reverse().complement()Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- reverse(*args, **kw)[source]¶
Reverse sequences
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- select(inplace=False, **kw)[source]¶
Select sequences
- Parameters:
**kw – All kwargs must be of the form
key_op=value, where op is one of the operators from theoperatormodule. Additionally, the operator'in'(membership) is supported. The different select conditions are combined with the and operator. If you need or, call select twice and combine the results with the|operator, e.g.seqs.select(...) | seqs.select(...)inplace – Whether to modify the original object (default: False)
- Returns:
Selected sequences
Example:
>>> from sugar import read >>> seqs = read() >>> seqs2 = seqs.select(len_gt=9500) # Select sequences with length > 9500
- sl(**kw)[source]¶
Method that allows you to slice the
BioBasketobject with non-default options.If you want to use the default options, you can slice the BioBasket object directly. For non-default options, slice the sliceable object returned by this method.
- Parameters:
**kw – All kwargs are documented in
BioSeq.sl().
Slice options:
The slice specifies which part of the sequence(s) are returned and is defined inside the square brackets
[]The following options are supported.- int
Returns a
BioSeqfrom the basket- slice
Returns a new
BioBasketobject with a subset of the sequences- str,feature,location
Returns a new
BioBasketobject with updated sequences inside, seeBioSeq.sl()- (int, object)
Returns a
BioSeqfrom the basket and slices it with the object, seeBioSeq.sl()- (slice, object)
Returns a new
BioBasketobject with a subset of the sequences which are replaced by subsequences according toBioSeq.sl()
Example:
>>> from sugar import read >>> seqs = read() >>> print(seqs[:2, 5:10]) 2 seqs in basket AB047639 5 CCCCT ... AB677533 5 CCCCC ... >>> print(seqs[:2, 'cds'][:, 0:3]) 2 seqs in basket AB047639 3 ATG ... AB677533 3 ATG ...
- sort(keys=('id',), reverse=False)[source]¶
Sort sequences in-place
- Parameters:
keys – Tuple of meta keys or functions to use for sorting. Can also be a single string or a callable. Defaults to sorting by id.
reverse – Use reverse order (default: False)
- Returns:
Sorted sequences
Example:
>>> from sugar import read >>> seqs = read() >>> seqs.sort(len)
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- tobiopython(*, msa=False)[source]¶
Convert the BioBasket to a list of biopython
SeqRecordobjects- Parameters:
msa (bool) – Return a biopython
MultipleSeqAlignmentobject instead of a list.
Attached
BioSeq.ftsfeatures are not converted.
- tobiotite(**kw)[source]¶
Convert BioBasket to a list of biotite
NucleotideSequenceorProteinSequenceinstance- Parameters:
type (str) –
'nt'creates aNucleotideSequenceinstance,'aa'creates aProteinSequenceinstance, by default the class is inferred from the sequence itself.msa (bool) – Return a biotite
Alignmentobject instead of a list, default is Falsegap (str) – Gap characters that must be removed from the sequence strings.
warn (bool) – Wether to warn if gap characters have been removed, default is True, not used with
msa=True
- todict()[source]¶
Return a dictionary with sequence ids as keys and sequences as values
Note
This method is different from the
BioBasket.groupby()method. Each value of the dict returned bytodict()is a sequence, while each value of the dict returned bygroupby()is a BioBasket.
- tostr(h=19, w=80, wid=19, wlen=4, showgc=True, add_hint=False, raw=False, add_header=True)[source]¶
Return string with information about sequences, used by
__str__()method
- translate(*args, **kw)[source]¶
Translate nucleotide sequences to amino acid sequences, see
translate().The original translate method of the str class can be used via
BioBasket.str.translate().Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- property d¶
Alias for
BioBasket.todict()
- data¶
Property holding the list of sequences
- property fts¶
FeatureListof containing features of all sequencesCan also be used as setter. Code example:
seqs.fts = new_fts.
- property ids¶
List of sequence ids
- meta¶
Property holding metadata
- property str¶
Namespace holding all available string methods.
The
BioBasket.strmethods call the correspondingBioSeq.strmethods under the hood and return either the modifiedBioBasketobject or a list of results. See_BioSeqStrfor available methods andstrfor method documentation.Example:
>>> seqs = read() >>> seqs.str.find('ATG') # Use string method [30, 12]
- class sugar.core.seq._BioBasketStr(parent)[source]¶
Bases:
objectHelper class to move all string methods into the
BioBasket.strnamespaceIt calls the corresponding
BioSeq.strmethod under the hood and returns either the modifiedBioBasketobject or a list of results.
- class sugar.core.seq._BioSeqStr(parent)[source]¶
Bases:
objectHelper class to hold all string methods in the
BioSeq.strnamespace.The methods modify the data in-place, if applicable, which is different from the behavior of the original string methods.
See
strfor documentation of the methods.