SampleInfo

List sample information and perform statistics

Input

  • infile: The input file to list sample information The input file should be a csv/tsv file with header

Output

  • outfile: Default: {{in.infile | basename}}.
    The output file with sample information, with mutated columns if envs.save_mutated is True.
    The basename of the output file will be the same as the input file.
    The file name of each plot will be slugified from the case name.
    Each plot has 3 formats: pdf, png and code.zip, which contains the data and R code to reproduce the plot.

Environment Variables

  • sep: Default: .
    The separator of the input file.
  • mutaters (type=json): Default: {}.
    A dict of mutaters to mutate the data frame.
    The key is the column name and the value is the R expression to mutate the column. The dict will be transformed to a list in R and passed to dplyr::mutate.
    You may also use paired() to identify paired samples. The function takes following arguments:
    • df: The data frame. Use . if the function is called in a dplyr pipe.
    • id_col: The column name in df for the ids to be returned in the final output.
    • compare_col: The column name in df to compare the values for each id in id_col.
    • idents: The values in compare_col to compare. It could be either an an integer or a vector. If it is an integer, the number of values in compare_col must be the same as the integer for the id to be regarded as paired. If it is a vector, the values in compare_col must be the same as the values in idents for the id to be regarded as paired.
    • uniq: Whether to return unique ids or not. Default is TRUE.
      If FALSE, you can mutate the meta data frame with the returned ids. Non-paired ids will be NA.
  • save_mutated (flag): Default: False.
    Whether to save the mutated columns.
  • exclude_cols (auto): Default: TCRData,BCRData,RNAData.
    The columns to exclude in the table in the report.
    Could be a list or a string separated by comma.
  • defaults (ns): The default parameters for envs.stats.
    • plot_type: Default: bar.
      The type of the plot.
      See the supported plot types here:
      https://pwwang.github.io/plotthis/reference/index.html The plot_type should be lower case and the plot function used in plotthis should be used. The mapping from plot_type to the plot function is like bar -> BarPlot, box -> BoxPlot, etc.
    • more_formats (list): Default: [].
      The additional formats to save the plot.
      By default, the plot will be saved in png, which is also used to display in the report. You can add more formats to save the plot.
      For example, more_formats = ["pdf", "svg"].
    • save_code (flag): Default: False.
      Whether to save the R code to reproduce the plot.
      The data used to plot will also be saved.
    • subset: An expression to subset the data frame before plotting.
      The expression should be a string of R expression that will be passed to dplyr::filter. For example, subset = "Sample == 'A'".
    • section: The section name in the report.
      In case you want to group the plots in the report.
    • devpars (ns): The device parameters for the plot.
      • width (type=int): The width of the plot.
      • height (type=int): The height of the plot.
      • res (type=int): Default: 100.
        The resolution of the plot.
    • descr: The description of the plot, shown in the report.
    • <more>: You can add more parameters to the defaults.
      These parameters will be expanded to the envs.stats for each case, and passed to individual plot functions.
  • stats (type=json): Default: {}.
    The statistics to perform.
    The keys are the case names and the values are the parameters inheirted from envs.defaults.