module
biopipen.utils.gene
Do gene name conversion
Classes
QueryGenesNotFound
— When genes cannot be found</>
Functions
gene_name_conversion
(
genes
,infmt
,outfmt
,dup
,species
,notfound
,suppress_messages
)
— Convert gene names using MyGeneInfo</>
class
biopipen.utils.gene.
QueryGenesNotFound
(
)
Bases
ValueError
Exception
BaseException
When genes cannot be found
function
biopipen.utils.gene.
gene_name_conversion
(
genes
, infmt
, outfmt
, dup='first'
, species='human'
, notfound='na'
, suppress_messages=False
)
Convert gene names using MyGeneInfo
Parameters
genes
(list) — A character/integer vector of gene names/idsinfmt
(str | list[str]) — A character vector of input gene name formatsSee the available scopes at https://docs.mygene.info/en/latest/doc/data.html#available-fields You can use ensg as a shortcut for ensembl.geneoutfmt
(str) — A character vector of output gene name formatsdup
(str, optional) — How to deal with duplicate gene names found.first: keep the first one (default), sorted by score descendingly last: keep the last one, sorted by score descendingly all: keep all of them, each will be a separate row: combine them into a single string, separated by X species
(str, optional) — A character vector of species namesnotfound
(str, optional) — How to deal with gene names that are not founderror: stop with an error message use-query: use the query gene name as the converted gene name skip: skip the gene names that are not found ignore: Same as "skip" na: use NA as the converted gene name (default)suppress_messages
(bool, optional) — Suppress the messages while querying
Returns
A dataframe with the query gene names and the converted gene namesWhen a gene name is not found, the converted name will be "NA" When duplicate gene names are found, the one with the highest score will be kept