module

biopipen.utils.gene

Do gene name conversion

Classes
Functions
  • gene_name_conversion(genes, infmt, outfmt, dup, species, notfound, suppress_messages) Convert gene names using MyGeneInfo</>
class

biopipen.utils.gene.QueryGenesNotFound()

Bases
ValueError Exception BaseException

When genes cannot be found

function

biopipen.utils.gene.gene_name_conversion(genes, infmt, outfmt, dup='first', species='human', notfound='na', suppress_messages=False)

Convert gene names using MyGeneInfo

Parameters
  • genes (list) A character/integer vector of gene names/ids
  • infmt (str | list[str]) A character vector of input gene name formatsSee the available scopes at https://docs.mygene.info/en/latest/doc/data.html#available-fields You can use ensg as a shortcut for ensembl.gene
  • outfmt (str) A character vector of output gene name formats
  • dup (str, optional) How to deal with duplicate gene names found.first: keep the first one (default), sorted by score descendingly last: keep the last one, sorted by score descendingly all: keep all of them, each will be a separate row : combine them into a single string, separated by X
  • species (str, optional) A character vector of species names
  • notfound (str, optional) How to deal with gene names that are not founderror: stop with an error message use-query: use the query gene name as the converted gene name skip: skip the gene names that are not found ignore: Same as "skip" na: use NA as the converted gene name (default)
  • suppress_messages (bool, optional) Suppress the messages while querying
Returns

A dataframe with the query gene names and the converted gene namesWhen a gene name is not found, the converted name will be "NA" When duplicate gene names are found, the one with the highest score will be kept