sugar.core.fts module¶
Feature related classes Feature, FeatureList, Location, Strand, Defect
- class sugar.core.fts.Defect(*values)[source]¶
Bases:
IntFlagTypes of location defects
A location has a defect, when the feature is not exactly located between start and stop base
- BETWEEN_CONSECUTIVE = 64¶
The position is between two consecutive bases
- BEYOND_LEFT = 4¶
The feature starts at an unknown position before the start base
- BEYOND_RIGHT = 8¶
The feature stops at an unknown position after or at the stop base
- MISS_LEFT = 1¶
Part of the feature has been truncated before the start base (e.g. by slicing with
FeatureList.slice())
- MISS_RIGHT = 2¶
Part of the feature has been truncated after or at the stop base (e.g. by slicing with
FeatureList.slice())
- NONE = 0¶
No location defect
- UNKNOWN_LEFT = 16¶
The feature starts at an unknown position
- UNKNOWN_RIGHT = 32¶
The feature stops at an unknown position
- UNKNOWN_SINGLE_BETWEEN = 128¶
The exact position is unknown, but it is at a single base between the start and stop residue
- class sugar.core.fts.Feature(type=None, locs=None, meta=None, **kw)[source]¶
Bases:
objectA single feature/annotation
- Parameters:
type (str) – The name of the feature class, e.g. gene or CDS
locs (list) – A list of feature locations. In most cases this list will contain only one location but multiple locations are possible, for example in virus genomes (due to frame shifts).
start,stop,strand – Instead of specifying the locations, a single location can be given by start and stop indices and optionally strand.
meta (dict) – The metadata describing the feature.
Note
The following metadata attributes are directly accessible as attributes of Feature: type, name, id and seqid. For example, the feature id can be obtained by both
Feature.idandFeature.meta.id.- classmethod frombiopython(obj)[source]¶
Create a
Featureobject from a biopythonSeqFeatureobject.- Parameters:
obj – The object to convert.
Location defects are ignored.
- distance(other, **kw)[source]¶
Distance to other location or location tuple, see
LocationTuple.distance()
- rc(seqlen=0)[source]¶
Reverse complement the feature.
After the in-place operation, the feature will be described by the reverse complement strand.
- Parameters:
seqlen (int) – The sequence length, the default of 0 will result in negative location indices.
- tobiopython()[source]¶
Convert Feature to biopython
SeqFeatureinstance
- toftsviewer(*, label='default', **kw)[source]¶
Convert feature to DNAFeaturesViewer
GraphicFeature- Parameters:
label – The label of the feature, may be a str key of the meta dictionary, or a function taking the feature and returning the label, or the str label itself, defaults to
'name'and if that is not present in the metadata,'type'.**kw – All other kwargs are passed to
GraphicFeature.
Instead of passing label, color and hatch to this function, corresponding values can also be passed via the
Feature.metaattribute with the keys'_ftsviewer_label','_ftsviewer_color'and'_ftsviewer_hatch'.
- write(fname=None, fmt=None, **kw)[source]¶
Write feature to file, see
write_fts()
- property id¶
Alias for
Feature.meta.id
- property loc¶
Access first location
- property locs¶
LocationTupleof feature locations
- property name¶
Alias for
Feature.meta.name
- property seqid¶
Alias for
Feature.meta.seqid
- property type¶
Alias for
Feature.meta.type
- class sugar.core.fts.FeatureList(data=None)[source]¶
Bases:
UserList- classmethod frombiopython(obj)[source]¶
Create a
FeatureListobject from a list of biopythonSeqFeatureobjects.- Parameters:
obj – The object to convert.
Location defetcs are ignored.
- classmethod frompandas(df, ftype=None, one_based=False)[source]¶
Convert
pandas.DataFrameobject toFeatureList- Parameters:
df – Dataframe with at least start and stop columns. The following columns can be used: type, start, stop, len, strand, defect. Other columns are stored as metadata.
ftype – If the dataframe has no type column, the
ftypecolumn is used instead, if it does not exist,ftypeis used directly as type.one_based – Whether the data uses one-based numbering. It will be converted to the zero-based numbering used by sugar.
- Returns:
created
FeatureListinstance
- get(type)[source]¶
Return the first feature of given feature type, e.g.
'cds'- Parameters:
type – String or list of multiple strings
- groupby(keys=('seqid',), flatten=False)[source]¶
Group features
- Parameters:
keys – Tuple of meta keys or functions to use for grouping. Can also be a single string or a callable. By default, the method groups by seqid only.
- Returns:
Nested dict structure
Example:
>>> from sugar import read_fts >>> fts = read_fts() >>> fts.groupby('type')
- plot_ftsviewer(*args, **kw)[source]¶
Plot features using DNAFeaturesViewer, see
plot_ftsviewer()
- rc(seqlen=0)[source]¶
Reverse complement all features, see
Feature.rc()- Parameters:
seqlen (int) – The sequence length, the default 0 will result in negative location positions.
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- remove_overlapping()[source]¶
Remove overlapping features
Features on earlier positions in the list are preferred. For example, to keep longer features, sort the list beforehand with
fts.sort(len, reverse=True).
- select(type=None, *, inplace=False, strand=None, **kw)[source]¶
Select features
Two different operating modi can be used, or both. Use the
typeargument to select features of one type (use a string) or of different types (use a list).All other kwargs must be of the form
key_op=value, where op is one of the operators from theoperatormodule. Additionally, the operator'in'(membership) is supported. The different selection criteria are combined with the and operator. If you need or, call select twice and combine the results with|operator, e.g.fts.select(...) | fts.select(...)- Parameters:
type – String or list of multiple strings
inplace – Whether to modify the original object (default: False)
**kw – Selection criteria
- Returns:
Selected features
Example:
>>> from sugar import read_fts >>> fts = read_fts() >>> fts2 = fts.select('CDS') # select all CDS fts >>> fts3 = fts.select(len_gt=100_000) # select all fts with length > 100 kB
- slice(start, stop, *, rel=0)[source]¶
Return a sub-feature between start and stop
- Parameters:
start,stop – start and stop locations
rel (int) – Subtracts the value
relfrom each location position.
- sort(keys=None, reverse=False)[source]¶
Sort features in-place
- Parameters:
keys – Tuple of meta keys or functions to use for sorting. None can be used as a single value or in the tuple to apply the default sorting by position. Can also be a single string or a callable.
reverse – Use reverse order (default: False)
- Returns:
Sorted features
Example:
>>> from sugar import read_fts >>> fts = read_fts() >>> fts.sort(('type', len))
Note
This function works in place and modifies the data. If you want to keep the original data, use the
copy()method first.
- tobiopython()[source]¶
Convert the FeatureList to a list of biopython
SeqFeatureobjects
- todict()[source]¶
Return a dictionary with feature ids as keys and features as values
Note
This method is different from the
FeatureList.groupby()method. Each value of the dict returned bytodict()is a feature, while each value of the dict returned bygroupby()is a FeatureList.
- tofmtstr(fmt, **kw)[source]¶
Write features to a string of the given format, see
write_fts()
- toftsviewer(*, label='default', colorby='type', color=None, circular=False, seqlen=None, seq=None, first_index=0, **kw)[source]¶
Convert features to DNAFeaturesViewer
GraphicRecord- Parameters:
label – The label of the feature, may be a str key of the meta dictionary, or a function taking the feature and returning the label, or the str label itself, defaults to
'name'and if that is not present in the metadata,'type'.colorby – How to define the color of the features, might be any key in the metadata, defaults to
'type', but can also be a function taking a Feature and returning an identifiercolor – The color of the features, this might be a constant color, a list of colors, or None for the default matplotlib color cycle (the default), or a dictionary mapping the feature identifiers to colors.
circular – If True return an instance of
CircularGraphicRecordinsteadseq – sequence or sequence data
seqlen – length of sequence, defaults to the length of
seqor the stop location of the last feature.**kw – All other kwargs are passed to
GraphicFeatureorGraphicRecordorCircularGraphicRecord, respectively.
- tolists(keys='type start stop strand')[source]¶
Return a generator yielding a list for each feature
- Parameters:
keys – Parameters from the metadata or location to return,
'len'is also allowed, can be a string or tuple, defaults to'type start stop strand'
Example:
>>> from sugar import read_fts >>> fts = read_fts().select('cDNA_match') >>> for record in fts.tolists('type start strand len'): ... print(*record) cDNA_match 101888622 - 4245 cDNA_match 103140200 - 30745 cDNA_match 103944892 - 7136 cDNA_match 107859806 - 2392
- topandas(keys='type start stop strand', **kw)[source]¶
Return a
pandas.DataFrameof the features- Parameters:
keys – Parameters from the metadata or location to return,
'len'is also allowed, can be a string or tuple, defaults to'type start stop strand'.
Example:
>>> from sugar import read_fts >>> fts = read_fts().select('cDNA_match') >>> df = fts.topandas() >>> print(df) type start stop strand 0 cDNA_match 101888622 101892867 - 1 cDNA_match 103140200 103170945 - 2 cDNA_match 103944892 103952028 - 3 cDNA_match 107859806 107862198 -
- tostr(raw=False, w=80, wt=12, wl=20, h=80, exclude_fts=())[source]¶
Return string with information about features, used by
__str__()method
- write(fname=None, fmt=None, **kw)[source]¶
Write features to file, see
write_fts()
- property d¶
Alias for
FeatureList.todict()
- property loc_range¶
Get the range of locations over all features
- Returns:
tuple
start, stopwith start and stop locations (zero-based numbering)
- class sugar.core.fts.Location(start, stop, strand='+', defect=0, meta=None)[source]¶
Bases:
objectClass describing the contiguous position of a feature
- distance(other, *, pos='inner', sign=False)[source]¶
Distance to other location or location tuple
- Parameters:
pos (str) –
'inner'returns the shortest distance between the locations,'middle'returns the distance between the mid locationssign (bool) – If set to True, the returned distance will have a negative sign if the other location has a smaller position. Otherwise the distance will always be larger equal than zero.
- property defect¶
Defect of the location
- property meta¶
Location can optionally have metadata
- property mid¶
Return the middle position of the location or location tuple
- property range¶
Get the range of the location or location tuple
- Returns:
tuple
start, stopwith start and stop location (zero-based numbering)
- start¶
Start location (zero-based numbering)
- stop¶
Stop location (zero-based numbering)
- property strand¶
Strand of the location
- class sugar.core.fts.LocationTuple(locs=None, start=None, stop=None, strand='+')[source]¶
Bases:
tupleTuple of contiguous locations, describing the position of a feature
- contains(other)¶
Whether the location range contains the other location range
- distance(other, *, pos='inner', sign=False)¶
Distance to other location or location tuple
- Parameters:
pos (str) –
'inner'returns the shortest distance between the locations,'middle'returns the distance between the mid locationssign (bool) – If set to True, the returned distance will have a negative sign if the other location has a smaller position. Otherwise the distance will always be larger equal than zero.
- overlaplen(other)¶
Return overlap length with the other location or location tuple
- overlaps(other)¶
Whether the location/locations overlap with the other location/locations
- shift(offset)[source]¶
Shift the locations by the given offset in-place
- Parameters:
offset (int) – The offset to shift the locations
- Returns:
The shifted location tuple
- Return type:
- property mid¶
Return the middle position of the location or location tuple
- property range¶
Get the range of the location or location tuple
- Returns:
tuple
start, stopwith start and stop location (zero-based numbering)
- property start¶
Get the start position of location tuple
- property stop¶
Get the stop position of location tuple
- class sugar.core.fts.Strand(*values)[source]¶
Bases:
StrEnumTypes of strand of feature location
- FORWARD = '+'¶
The feature is located on the forward strand
- NONE = '.'¶
The feature is not associated with any strand
- REVERSE = '-'¶
The feature is located on the reverse strand
- UNKNOWN = '?'¶
The strandness of the feature is unknown