sugar.core.fts module

Feature related classes Feature, FeatureList, Location, Strand, Defect

class sugar.core.fts.Defect(*values)[source]

Bases: IntFlag

Types of location defects

A location has a defect, when the feature is not exactly located between start and stop base

BETWEEN_CONSECUTIVE = 64

The position is between two consecutive bases

BEYOND_LEFT = 4

The feature starts at an unknown position before the start base

BEYOND_RIGHT = 8

The feature stops at an unknown position after or at the stop base

MISS_LEFT = 1

Part of the feature has been truncated before the start base (e.g. by slicing with FeatureList.slice())

MISS_RIGHT = 2

Part of the feature has been truncated after or at the stop base (e.g. by slicing with FeatureList.slice())

NONE = 0

No location defect

UNKNOWN_LEFT = 16

The feature starts at an unknown position

UNKNOWN_RIGHT = 32

The feature stops at an unknown position

UNKNOWN_SINGLE_BETWEEN = 128

The exact position is unknown, but it is at a single base between the start and stop residue

class sugar.core.fts.Feature(type=None, locs=None, meta=None, **kw)[source]

Bases: object

A single feature/annotation

Parameters:
  • type (str) – The name of the feature class, e.g. gene or CDS

  • locs (list) – A list of feature locations. In most cases this list will contain only one location but multiple locations are possible, for example in virus genomes (due to frame shifts).

  • start,stop,strand – Instead of specifying the locations, a single location can be given by start and stop indices and optionally strand.

  • meta (dict) – The metadata describing the feature.

Note

The following metadata attributes are directly accessible as attributes of Feature: type, name, id and seqid. For example, the feature id can be obtained by both Feature.id and Feature.meta.id.

classmethod frombiopython(obj)[source]

Create a Feature object from a biopython SeqFeature object.

Parameters:

obj – The object to convert.

Location defects are ignored.

contains(other)[source]

Whether the feature location range contains other

distance(other, **kw)[source]

Distance to other location or location tuple, see LocationTuple.distance()

overlaplen(other)[source]

Return overlap length with the other location or location tuple

overlaps(other)[source]

Whether the feature location overlaps with the other

rc(seqlen=0)[source]

Reverse complement the feature.

After the in-place operation, the feature will be described by the reverse complement strand.

Parameters:

seqlen (int) – The sequence length, the default of 0 will result in negative location indices.

tobiopython()[source]

Convert Feature to biopython SeqFeature instance

toftsviewer(*, label='default', **kw)[source]

Convert feature to DNAFeaturesViewer GraphicFeature

Parameters:
  • label – The label of the feature, may be a str key of the meta dictionary, or a function taking the feature and returning the label, or the str label itself, defaults to 'name' and if that is not present in the metadata, 'type'.

  • **kw – All other kwargs are passed to GraphicFeature.

Instead of passing label, color and hatch to this function, corresponding values can also be passed via the Feature.meta attribute with the keys '_ftsviewer_label', '_ftsviewer_color' and '_ftsviewer_hatch'.

write(fname=None, fmt=None, **kw)[source]

Write feature to file, see write_fts()

property id

Alias for Feature.meta.id

property loc

Access first location

property locs

LocationTuple of feature locations

property name

Alias for Feature.meta.name

property seqid

Alias for Feature.meta.seqid

property type

Alias for Feature.meta.type

class sugar.core.fts.FeatureList(data=None)[source]

Bases: UserList

classmethod frombiopython(obj)[source]

Create a FeatureList object from a list of biopython SeqFeature objects.

Parameters:

obj – The object to convert.

Location defetcs are ignored.

classmethod frompandas(df, ftype=None, one_based=False)[source]

Convert pandas.DataFrame object to FeatureList

Parameters:
  • df – Dataframe with at least start and stop columns. The following columns can be used: type, start, stop, len, strand, defect. Other columns are stored as metadata.

  • ftype – If the dataframe has no type column, the ftype column is used instead, if it does not exist, ftype is used directly as type.

  • one_based – Whether the data uses one-based numbering. It will be converted to the zero-based numbering used by sugar.

Returns:

created FeatureList instance

copy()[source]

Return a deep copy of the object

get(type)[source]

Return the first feature of given feature type, e.g. 'cds'

Parameters:

type – String or list of multiple strings

groupby(keys=('seqid',), flatten=False)[source]

Group features

Parameters:

keys – Tuple of meta keys or functions to use for grouping. Can also be a single string or a callable. By default, the method groups by seqid only.

Returns:

Nested dict structure

Example:

>>> from sugar import read_fts
>>> fts = read_fts()
>>> fts.groupby('type')
plot_ftsviewer(*args, **kw)[source]

Plot features using DNAFeaturesViewer, see plot_ftsviewer()

rc(seqlen=0)[source]

Reverse complement all features, see Feature.rc()

Parameters:

seqlen (int) – The sequence length, the default 0 will result in negative location positions.

Note

This function works in place and modifies the data. If you want to keep the original data, use the copy() method first.

remove_duplicates()[source]

Remove duplicate features

remove_nested()[source]

Remove nested features, i.e. features contained within others

remove_overlapping()[source]

Remove overlapping features

Features on earlier positions in the list are preferred. For example, to keep longer features, sort the list beforehand with fts.sort(len, reverse=True).

select(type=None, *, inplace=False, strand=None, **kw)[source]

Select features

Two different operating modi can be used, or both. Use the type argument to select features of one type (use a string) or of different types (use a list).

All other kwargs must be of the form key_op=value, where op is one of the operators from the operator module. Additionally, the operator 'in' (membership) is supported. The different selection criteria are combined with the and operator. If you need or, call select twice and combine the results with | operator, e.g. fts.select(...) | fts.select(...)

Parameters:
  • type – String or list of multiple strings

  • inplace – Whether to modify the original object (default: False)

  • **kw – Selection criteria

Returns:

Selected features

Example:

>>> from sugar import read_fts
>>> fts = read_fts()
>>> fts2 = fts.select('CDS')  # select all CDS fts
>>> fts3 = fts.select(len_gt=100_000)  # select all fts with length > 100 kB
slice(start, stop, *, rel=0)[source]

Return a sub-feature between start and stop

Parameters:
  • start,stop – start and stop locations

  • rel (int) – Subtracts the value rel from each location position.

sort(keys=None, reverse=False)[source]

Sort features in-place

Parameters:
  • keys – Tuple of meta keys or functions to use for sorting. None can be used as a single value or in the tuple to apply the default sorting by position. Can also be a single string or a callable.

  • reverse – Use reverse order (default: False)

Returns:

Sorted features

Example:

>>> from sugar import read_fts
>>> fts = read_fts()
>>> fts.sort(('type', len))

Note

This function works in place and modifies the data. If you want to keep the original data, use the copy() method first.

tobiopython()[source]

Convert the FeatureList to a list of biopython SeqFeature objects

todict()[source]

Return a dictionary with feature ids as keys and features as values

Note

This method is different from the FeatureList.groupby() method. Each value of the dict returned by todict() is a feature, while each value of the dict returned by groupby() is a FeatureList.

tofmtstr(fmt, **kw)[source]

Write features to a string of the given format, see write_fts()

toftsviewer(*, label='default', colorby='type', color=None, circular=False, seqlen=None, seq=None, first_index=0, **kw)[source]

Convert features to DNAFeaturesViewer GraphicRecord

Parameters:
  • label – The label of the feature, may be a str key of the meta dictionary, or a function taking the feature and returning the label, or the str label itself, defaults to 'name' and if that is not present in the metadata, 'type'.

  • colorby – How to define the color of the features, might be any key in the metadata, defaults to 'type', but can also be a function taking a Feature and returning an identifier

  • color – The color of the features, this might be a constant color, a list of colors, or None for the default matplotlib color cycle (the default), or a dictionary mapping the feature identifiers to colors.

  • circular – If True return an instance of CircularGraphicRecord instead

  • seq – sequence or sequence data

  • seqlen – length of sequence, defaults to the length of seq or the stop location of the last feature.

  • **kw – All other kwargs are passed to GraphicFeature or GraphicRecord or CircularGraphicRecord, respectively.

tolists(keys='type start stop strand')[source]

Return a generator yielding a list for each feature

Parameters:

keys – Parameters from the metadata or location to return, 'len' is also allowed, can be a string or tuple, defaults to 'type start stop strand'

Example:

>>> from sugar import read_fts
>>> fts = read_fts().select('cDNA_match')
>>> for record in fts.tolists('type start strand len'):
...     print(*record)
cDNA_match 101888622 - 4245
cDNA_match 103140200 - 30745
cDNA_match 103944892 - 7136
cDNA_match 107859806 - 2392
topandas(keys='type start stop strand', **kw)[source]

Return a pandas.DataFrame of the features

Parameters:

keys – Parameters from the metadata or location to return, 'len' is also allowed, can be a string or tuple, defaults to 'type start stop strand'.

Example:

>>> from sugar import read_fts
>>> fts = read_fts().select('cDNA_match')
>>> df = fts.topandas()
>>> print(df)
        type      start      stop   strand
0  cDNA_match  101888622  101892867      -
1  cDNA_match  103140200  103170945      -
2  cDNA_match  103944892  103952028      -
3  cDNA_match  107859806  107862198      -
tostr(raw=False, w=80, wt=12, wl=20, h=80, exclude_fts=())[source]

Return string with information about features, used by __str__() method

write(fname=None, fmt=None, **kw)[source]

Write features to file, see write_fts()

property d

Alias for FeatureList.todict()

property loc_range

Get the range of locations over all features

Returns:

tuple start, stop with start and stop locations (zero-based numbering)

class sugar.core.fts.Location(start, stop, strand='+', defect=0, meta=None)[source]

Bases: object

Class describing the contiguous position of a feature

contains(other)[source]

Whether the location range contains the other location range

distance(other, *, pos='inner', sign=False)[source]

Distance to other location or location tuple

Parameters:
  • pos (str) – 'inner' returns the shortest distance between the locations, 'middle' returns the distance between the mid locations

  • sign (bool) – If set to True, the returned distance will have a negative sign if the other location has a smaller position. Otherwise the distance will always be larger equal than zero.

overlaplen(other)[source]

Return overlap length with the other location or location tuple

overlaps(other)[source]

Whether the location/locations overlap with the other location/locations

shift(offset)[source]

Shift the location by the given offset in-place

Parameters:

offset (int) – The offset to shift the location

Returns:

The shifted location

Return type:

Location

property defect

Defect of the location

property meta

Location can optionally have metadata

property mid

Return the middle position of the location or location tuple

property range

Get the range of the location or location tuple

Returns:

tuple start, stop with start and stop location (zero-based numbering)

start

Start location (zero-based numbering)

stop

Stop location (zero-based numbering)

property strand

Strand of the location

class sugar.core.fts.LocationTuple(locs=None, start=None, stop=None, strand='+')[source]

Bases: tuple

Tuple of contiguous locations, describing the position of a feature

contains(other)

Whether the location range contains the other location range

distance(other, *, pos='inner', sign=False)

Distance to other location or location tuple

Parameters:
  • pos (str) – 'inner' returns the shortest distance between the locations, 'middle' returns the distance between the mid locations

  • sign (bool) – If set to True, the returned distance will have a negative sign if the other location has a smaller position. Otherwise the distance will always be larger equal than zero.

overlaplen(other)

Return overlap length with the other location or location tuple

overlaps(other)

Whether the location/locations overlap with the other location/locations

shift(offset)[source]

Shift the locations by the given offset in-place

Parameters:

offset (int) – The offset to shift the locations

Returns:

The shifted location tuple

Return type:

LocationTuple

property mid

Return the middle position of the location or location tuple

property range

Get the range of the location or location tuple

Returns:

tuple start, stop with start and stop location (zero-based numbering)

property start

Get the start position of location tuple

property stop

Get the stop position of location tuple

class sugar.core.fts.Strand(*values)[source]

Bases: StrEnum

Types of strand of feature location

FORWARD = '+'

The feature is located on the forward strand

NONE = '.'

The feature is not associated with any strand

REVERSE = '-'

The feature is located on the reverse strand

UNKNOWN = '?'

The strandness of the feature is unknown