sugar.web._entrez module

Entrez client class

Warning

This module is still experimental.

class sugar.web._entrez.Entrez(path=None, api_key=None)[source]

Bases: object

Entrez client

Parameters:
  • path – The path for persistence of downloaded files, default: no persistence, alternatively the path can be set with the ENTREZ_PAH environment variable.

  • api_key – Optionally, you can use an API key that allows you to make more requests than without an API key, alternatively, the API key can be set with the ENTREZ_API_KEY environment variable.

Without an API key you can make 3 requests per second, with an API key you can make 10 requests per second. The client will make sure that you do not use up this quota. By setting the path (environment) variable, repeated requests for the same id will not count against the quota.

Example:

>>> from sugar.web import Entrez
>>> client = Entrez()
>>> seq = client.get_seq('AF522874')  # fetch multiple seqs with client.get_basket()
fetch_basket(seqids, **kw)[source]

Fetch multiple sequences using the client

Parameters:
  • seqids – A list of ids to fetch

  • **kw – All other kwargs are passed to fetch_seq()

Returns:

List of filenames or StringIO objects

fetch_seq(seqid, *, rettype='gb', db='nuccore', ext=None, retmode='text', overwrite=False, path=None)[source]

Fetch a sequence using the client

Parameters:
  • seqid – Id of the sequence to be fetched

  • path (str) – An alternative path for persistence, which might be different from the initialized path.

  • overwrite (bool) – If True, redownload the sequence, even if it already exists in the path

  • ext (str) – The file extension, defaults to rettype parameter.

  • **kw – Other kwargs are used to construct the request url, values other than the defaults are untested.

Returns:

The filename of the downloaded content. If path is not set, the content is returned as StringIO instance.

get_basket(seqids, *, read_kw=None, **kw)[source]

Fetch multiple sequences and return them

Parameters:
  • seqids – A list of ids to fetch

  • read_kw (dict) – Dictionary of reading options passed to read()

  • **kw – All other kwargs are passed to fetch_basket()

Returns:

Fetched BioBasket object

get_seq(seqid, *, read_kw=None, **kw)[source]

Fetch a sequence and return it

Parameters:
  • seqid – Id of the sequence to be fetched

  • read_kw (dict) – Dictionary of read options passed to read()

  • **kw – All other kwargs are passed to fetch_seq()

Returns:

Fetched BioSeq object