biopipen.ns.delim
Tools to deal with csv/tsv files
- RowsBinder(Proc) — Bind rows of input files</>
- SampleInfo(Proc) — List sample information and perform statistics</>
biopipen.ns.delim.RowsBinder(*args, **kwds) → Proc
  Bind rows of input files
- cache— Should we detect whether the jobs are cached?
- desc— The description of the process. Will use the summary fromthe docstring by default.
- dirsig— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
- envs— The arguments that are job-independent, useful for common optionsacross jobs.
- envs_depth— How deep to update the envs when subclassed.
- error_strategy— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
 
- export— When True, the results will be exported to- <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
- forks— How many jobs to run simultaneously?
- input— The keys for the input channel
- input_data— The input data (will be computed for dependent processes)
- lang— The language for the script to run. Should be the path to theinterpreter if- langis not in- $PATH.
- name— The name of the process. Will use the class name by default.
- nexts— Computed from- requiresto build the process relationships
- num_retries— How many times to retry to jobs once error occurs
- order— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by- Pipen.set_starts()
- output— The output keys for the output channel(the data will be computed)
- output_data— The output data (to pass to the next processes)
- plugin_opts— Options for process-level plugins
- requires— The dependency processes
- scheduler— The scheduler to run the jobs
- scheduler_opts— The options for the scheduler
- script— The script template for the process
- submission_batch— How many jobs to be submited simultaneously
- template— Define the template engine to use.This could be either a template engine or a dict with key- engineindicating the template engine and the rest the arguments passed to the constructor of the- pipen.template.Templateobject. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of- pipen.template.Template. You can subclass- pipen.template.Templateto use your own template engine.
- infiles— The input files to bind.The input files should have the same number of columns, and same delimiter.
- outfile— The output file with rows bound
- filenames— Whether to add filename as the last column.Either a string of an R function that starts with- functionor a list of names (or string separated by comma) to add for each input file. The R function takes the path of the input file as the only argument and should return a string. The string will be added as the last column of the output file.
- filenames_col— The column name for the- filenamescolumns
- header(flag) — Whether the input files have header
- sep— The separator of the input files
- __init_subclass__- (- )— Do the requirements inferring since we need them to build up theprocess relationship </>
- from_proc- (- proc,- name,- desc,- envs,- envs_depth,- cache,- export,- error_strategy,- num_retries,- forks,- input_data,- order,- plugin_opts,- requires,- scheduler,- scheduler_opts,- submission_batch- )(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
- gc- (- )— GC process for the process to save memory after it's done</>
- init- (- )— Init all other properties and jobs</>
- log- (- level,- msg,- *args,- logger- )— Log message for the process</>
- run- (- )— Run the process</>
pipen.proc.ProcMeta(name, bases, namespace, **kwargs)
  Meta class for Proc
- __call__- (- cls,- *args,- **kwds- )(Proc) — Make sure Proc subclasses are singletons</>
- __instancecheck__- (- cls,- instance- )— Override for isinstance(instance, cls).</>
- __repr__- (- cls- )(str) — Representation for the Proc subclasses</>
- __subclasscheck__- (- cls,- subclass- )— Override for issubclass(subclass, cls).</>
- register- (- cls,- subclass- )— Register a virtual subclass of an ABC.</>
register(cls, subclass)Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__(cls, instance)Override for isinstance(instance, cls).
__subclasscheck__(cls, subclass)Override for issubclass(subclass, cls).
__repr__(cls) → strRepresentation for the Proc subclasses
__call__(cls, *args, **kwds)Make sure Proc subclasses are singletons
- *args(Any) — and
- **kwds(Any) — Arguments for the constructor
The Proc instance
from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)
  Create a subclass of Proc using another Proc subclass or Proc itself
- proc(Type) — The Proc subclass
- name(str, optional) — The new name of the process
- desc(str, optional) — The new description of the process
- envs(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
- envs_depth(int, optional) — How deep to update the envs when subclassed.
- cache(bool, optional) — Whether we should check the cache for the jobs
- export(bool, optional) — When True, the results will be exported to- <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
- error_strategy(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
 
- num_retries(int, optional) — How many times to retry to jobs once error occurs
- forks(int, optional) — New forks for the new process
- input_data(Any, optional) — The input data for the process. Only when this processis a start process
- order(int, optional) — The order to execute the new process
- plugin_opts(Mapping, optional) — The new plugin options, unspecified items will beinherited.
- requires(Sequence, optional) — The required processes for the new process
- scheduler(str, optional) — The new shedular to run the new process
- scheduler_opts(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
- submission_batch(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__()
  Do the requirements inferring since we need them to build up theprocess relationship
init()
  Init all other properties and jobs
gc()
  GC process for the process to save memory after it's done
log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)
  Log message for the process
- level(int | str) — The log level of the record
- msg(str) — The message to log
- *args— The arguments to format the message
- logger(LoggerAdapter, optional) — The logging logger
run()
  Run the process
biopipen.ns.delim.SampleInfo(*args, **kwds) → Proc
  List sample information and perform statistics
- cache— Should we detect whether the jobs are cached?
- desc— The description of the process. Will use the summary fromthe docstring by default.
- dirsig— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
- envs— The arguments that are job-independent, useful for common optionsacross jobs.
- envs_depth— How deep to update the envs when subclassed.
- error_strategy— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
 
- export— When True, the results will be exported to- <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
- forks— How many jobs to run simultaneously?
- input— The keys for the input channel
- input_data— The input data (will be computed for dependent processes)
- lang— The language for the script to run. Should be the path to theinterpreter if- langis not in- $PATH.
- name— The name of the process. Will use the class name by default.
- nexts— Computed from- requiresto build the process relationships
- num_retries— How many times to retry to jobs once error occurs
- order— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by- Pipen.set_starts()
- output— The output keys for the output channel(the data will be computed)
- output_data— The output data (to pass to the next processes)
- plugin_opts— Options for process-level plugins
- requires— The dependency processes
- scheduler— The scheduler to run the jobs
- scheduler_opts— The options for the scheduler
- script— The script template for the process
- submission_batch— How many jobs to be submited simultaneously
- template— Define the template engine to use.This could be either a template engine or a dict with key- engineindicating the template engine and the rest the arguments passed to the constructor of the- pipen.template.Templateobject. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of- pipen.template.Template. You can subclass- pipen.template.Templateto use your own template engine.
- infile— The input file to list sample informationThe input file should be a csv/tsv file with header
- outfile— The output file with sample information, with mutated columnsif- envs.save_mutatedis True. The basename of the output file will be the same as the input file. The file name of each plot will be slugified from the case name. Each plot has 3 formats: pdf, png and code.zip, which contains the data and R code to reproduce the plot.
- defaults(ns) — The default parameters for- envs.stats.- - plot_type: The type of the plot.
 See the supported plot types here:
 https://pwwang.github.io/plotthis/reference/index.html
 The plot_type should be lower case and the plot function used in
 plotthisshould be used. The mapping from plot_type to the
 plot function is likebar -> BarPlot,box -> BoxPlot, etc.
- - more_formats (list): The additional formats to save the plot.
 By default, the plot will be saved in png, which is also used to
 display in the report. You can add more formats to save the plot.
 For example,more_formats = ["pdf", "svg"].
- - save_code (flag): Whether to save the R code to reproduce the plot.
 The data used to plot will also be saved.
- - subset: An expression to subset the data frame before plotting.
 The expression should be a string of R expression that will be passed
 todplyr::filter. For example,subset = "Sample == 'A'".
- - section: The section name in the report.
 In case you want to group the plots in the report.
- - devpars (ns): The device parameters for the plot.
 - width (type=int): The width of the plot.
 - height (type=int): The height of the plot.
 - res (type=int): The resolution of the plot.
- - descr: The description of the plot, shown in the report.
- - : You can add more parameters to the defaults. 
 These parameters will be expanded to theenvs.statsfor each case,
 and passed to individual plot functions.
 
- - plot_type: The type of the plot.
- exclude_cols(auto) — The columns to exclude in the table in the report.Could be a list or a string separated by comma.
- mutaters(type=json) — A dict of mutaters to mutate the data frame.The key is the column name and the value is the R expression to mutate the column. The dict will be transformed to a list in R and passed to- dplyr::mutate. You may also use- paired()to identify paired samples. The function takes following arguments:- * df: The data frame. Use.if the function is called in
 a dplyr pipe.
- * id_col: The column name indffor the ids to be returned in
 the final output.
- * compare_col: The column name indfto compare the values for
 each id inid_col.
- * idents: The values incompare_colto compare. It could be
 either an an integer or a vector. If it is an integer,
 the number of values incompare_colmust be the same as
 the integer for theidto be regarded as paired. If it is
 a vector, the values incompare_colmust be the same
 as the values inidentsfor theidto be regarded as paired.
- * uniq: Whether to return unique ids or not. Default isTRUE.
 IfFALSE, you can mutate the meta data frame with the
 returned ids. Non-paired ids will beNA.
 
- * 
- save_mutated(flag) — Whether to save the mutated columns.
- sep— The separator of the input file.
- stats(type=json) — The statistics to perform.The keys are the case names and the values are the parameters inheirted from- envs.defaults.
- __init_subclass__- (- )— Do the requirements inferring since we need them to build up theprocess relationship </>
- from_proc- (- proc,- name,- desc,- envs,- envs_depth,- cache,- export,- error_strategy,- num_retries,- forks,- input_data,- order,- plugin_opts,- requires,- scheduler,- scheduler_opts,- submission_batch- )(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
- gc- (- )— GC process for the process to save memory after it's done</>
- init- (- )— Init all other properties and jobs</>
- log- (- level,- msg,- *args,- logger- )— Log message for the process</>
- run- (- )— Run the process</>
pipen.proc.ProcMeta(name, bases, namespace, **kwargs)
  Meta class for Proc
- __call__- (- cls,- *args,- **kwds- )(Proc) — Make sure Proc subclasses are singletons</>
- __instancecheck__- (- cls,- instance- )— Override for isinstance(instance, cls).</>
- __repr__- (- cls- )(str) — Representation for the Proc subclasses</>
- __subclasscheck__- (- cls,- subclass- )— Override for issubclass(subclass, cls).</>
- register- (- cls,- subclass- )— Register a virtual subclass of an ABC.</>
register(cls, subclass)Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__(cls, instance)Override for isinstance(instance, cls).
__subclasscheck__(cls, subclass)Override for issubclass(subclass, cls).
__repr__(cls) → strRepresentation for the Proc subclasses
__call__(cls, *args, **kwds)Make sure Proc subclasses are singletons
- *args(Any) — and
- **kwds(Any) — Arguments for the constructor
The Proc instance
from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)
  Create a subclass of Proc using another Proc subclass or Proc itself
- proc(Type) — The Proc subclass
- name(str, optional) — The new name of the process
- desc(str, optional) — The new description of the process
- envs(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
- envs_depth(int, optional) — How deep to update the envs when subclassed.
- cache(bool, optional) — Whether we should check the cache for the jobs
- export(bool, optional) — When True, the results will be exported to- <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
- error_strategy(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
 
- num_retries(int, optional) — How many times to retry to jobs once error occurs
- forks(int, optional) — New forks for the new process
- input_data(Any, optional) — The input data for the process. Only when this processis a start process
- order(int, optional) — The order to execute the new process
- plugin_opts(Mapping, optional) — The new plugin options, unspecified items will beinherited.
- requires(Sequence, optional) — The required processes for the new process
- scheduler(str, optional) — The new shedular to run the new process
- scheduler_opts(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
- submission_batch(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__()
  Do the requirements inferring since we need them to build up theprocess relationship
init()
  Init all other properties and jobs
gc()
  GC process for the process to save memory after it's done
log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)
  Log message for the process
- level(int | str) — The log level of the record
- msg(str) — The message to log
- *args— The arguments to format the message
- logger(LoggerAdapter, optional) — The logging logger
run()
  Run the process