
Helper functions to select clones based on various criteria
Source:R/clonalutils.R
clone_selectors.Rd
These helper functions allow for the selection of clones based on various criteria such as size, group comparison, and existence in specific groups.
Usage
top(
n,
groups = NULL,
data = NULL,
order = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
sel(
expr,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
uniq(
group1,
group2,
...,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
shared(
group1,
group2,
...,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
gt(
group1,
group2,
include_zeros = TRUE,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
ge(
group1,
group2,
include_zeros = TRUE,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
lt(
group1,
group2,
include_zeros = TRUE,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
le(
group1,
group2,
include_zeros = TRUE,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
eq(
group1,
group2,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
ne(
group1,
group2,
include_zeros = TRUE,
groups = NULL,
data = NULL,
id = NULL,
in_form = NULL,
return_ids = NULL
)
and(x, y)
or(x, y)
Arguments
- n
The number of top clones to select or the threshold size.
- groups
The column names in the meta data to group the cells. By default, it is assumed
facet_by
andsplit_by
to be in the parent frame if used in scplotter functions. When used in dplyr verbs, it should be a character vector of the grouping columns, where the first column is used to extract the values (count) forgroup1
,group2
, and...
and the rest are used to keep the groupings.- data
The data frame containing clone information. Default is NULL. If NULL, when used in scplotter functions, it will get data from parent.frame. A typical
data
should have a column namedCloneID
and other columns for the groupings. Supposingly it should be a grouped data frame with the grouping columns. Under each grouping column, the value should be the size of the clone. By default, the data is assumed to be in the parent frame. When used in dplyr verbs, it should be the parent data frame passed to the dplyr verb.- order
The order of the clones to select. It can be an expression to order the clones by a specific column. Only used in
top()
.- id
The column name that contains the clone ID. Default is "CTaa".
- in_form
The format of the input data. It can be "long" or "wide". If "long", the data should be in a long format with a column for the clone IDs and a column for the size. If "wide", the data should be in a wide format with columns for the clone IDs and the size for each group. When used in dplyr verbs, it should be "long" by default. If used in scplotter functions, it should be "wide" by default.#'
- return_ids
If TRUE, the function returns a vector with the same length as the data, with CTaa values for selected clones and NA for others. If FALSE, it returns a subset data frame with only the selected clones. Default is NULL, which will be determined based on the data. If the function is used in a context of dplyr verbs, it defaults to TRUE. Otherwise, it defaults to FALSE.
- expr
The expression (in characters) to filter the clones (e.g. "group1 > group2" to select clones where group1 is larger than group2).
- group1
The first group to compare.
- group2
The second group to compare.
- ...
More groups to compare.
- include_zeros
Whether to include clones with zero size in the comparison. If TRUE, in a comparison (s1 > s2) for a clone to be selected, both s1 and s2 must be greater than 0. If FALSE, only the first group must be greater than the second group.
- x
The first vector to compare in logical operations (and/or).
- y
The second vector to compare in logical operations (and/or).
Details
These helper functions are designed to be used in a dplyr pipeline or used internally in other scplotter functions to select clones based on various criteria.
When used in a dplyr pipeline, they will return a vector with the same length as the input data, with the selected clones' CTaa values (clone IDs) and NA for others. It is useful for adding a new column to the data frame. For the functions that need
group1
,group2
, and/or...
,groups
should be provided to specify the grouping columns. Thengroup1
,group2
, and...
can be the values in the grouping column. To include more grouping columns, just usec(grouping1, grouping2, ...)
, wheregrouping1
is used for values ofgroup1
,group2
and...
;grouping2
and so on will be kept as the groupings where the clones are selected in each combination of the grouping values.When used in a scplotter function, they will return a subset of the data frame with only the selected clones. This is useful for filtering the data frame to only include the clones that meet the criteria. It is used internally in some other scplotter functions, such as
ClonalStatPlot
, to select clones. The groupings are also applied, and defaulting tofacet_by
andsplit_by
in the parent frame.When used independently, you should pass the arguments explicitly, such as
groups
andreturn_ids
, to control the behavior and the output of the function.
Examples
data <- data.frame(
CTaa = c("AA1", "AA2", "AA3", "AA4", "AA5", "AA6", "AA7", "AA8", "AA9", "AA10"),
group1 = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
group2 = c(7, 3, 8, 2, 1, 5, 9, 4, 6, 0),
groups = c("A", "A", "A", "A", "B", "B", "B", "B", "B", "B")
)
data <- data[order(data$group1 + data$group2, decreasing = TRUE), ]
top(3)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA9 8 6 B
#> 3 AA8 7 4 B
top(3, groups = "groups")
#> # A tibble: 6 × 4
#> # Groups: groups [2]
#> CTaa group1 group2 groups
#> <chr> <dbl> <dbl> <chr>
#> 1 AA7 6 9 B
#> 2 AA9 8 6 B
#> 3 AA8 7 4 B
#> 4 AA3 2 8 A
#> 5 AA1 0 7 A
#> 6 AA4 3 2 A
sel(group1 == 0 | group2 == 0)
#> CTaa group1 group2 groups
#> 1 AA10 9 0 B
#> 2 AA1 0 7 A
uniq(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA10 9 0 B
shared(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA9 8 6 B
#> 3 AA8 7 4 B
#> 4 AA3 2 8 A
#> 5 AA6 5 5 B
#> 6 AA4 3 2 A
#> 7 AA5 4 1 B
#> 8 AA2 1 3 A
gt(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA9 8 6 B
#> 2 AA8 7 4 B
#> 3 AA10 9 0 B
#> 4 AA4 3 2 A
#> 5 AA5 4 1 B
lt(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA3 2 8 A
#> 3 AA1 0 7 A
#> 4 AA2 1 3 A
le(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA3 2 8 A
#> 3 AA6 5 5 B
#> 4 AA1 0 7 A
#> 5 AA2 1 3 A
lt(group1, group2, include_zeros = FALSE)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA3 2 8 A
#> 3 AA2 1 3 A
eq(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA6 5 5 B
ne(group1, group2)
#> CTaa group1 group2 groups
#> 1 AA7 6 9 B
#> 2 AA9 8 6 B
#> 3 AA8 7 4 B
#> 4 AA3 2 8 A
#> 5 AA10 9 0 B
#> 6 AA1 0 7 A
#> 7 AA4 3 2 A
#> 8 AA5 4 1 B
#> 9 AA2 1 3 A
# Use them in a dplyr pipeline
data <- tidyr::pivot_longer(data, cols = c("group1", "group2"),
names_to = "group", values_to = "value")
data <- tidyr::uncount(data, !!rlang::sym("value"))
unique(dplyr::mutate(data, Top3 = top(3))$Top3)
#> [1] NA "AA10" "AA1" "AA2"
unique(dplyr::mutate(data, Top3 = top(3, groups = "groups"))$Top3)
#> [1] NA "AA3" "AA6" "AA10" "AA1" "AA5" "AA2"
unique(dplyr::mutate(data, Unique = sel(group1 == 0 | group2 == 0, groups = "group"))$Unique)
#> [1] NA "AA10" "AA1"
unique(dplyr::mutate(data, UniqueInG1 = uniq(group1, group2, groups = "group"))$UniqueInG1)
#> [1] NA "AA10"
unique(dplyr::mutate(data, Shared = shared(group1, group2, groups = "group"))$Shared)
#> [1] "AA7" "AA9" "AA8" "AA3" "AA6" NA "AA4" "AA5" "AA2"
unique(dplyr::mutate(data, Greater = gt(group1, group2, groups = "group"))$Greater)
#> [1] NA "AA9" "AA8" "AA10" "AA4" "AA5"
unique(dplyr::mutate(data, Less = lt(group1, group2, groups = "group"))$Less)
#> [1] "AA7" NA "AA3" "AA1" "AA2"
unique(dplyr::mutate(data, LessEqual = le(group1, group2, groups = "group"))$LessEqual)
#> [1] "AA7" NA "AA3" "AA6" "AA1" "AA2"
unique(dplyr::mutate(data, GreaterEqual = ge(group1, group2, groups = "group"))$GreaterEqual)
#> [1] NA "AA9" "AA8" "AA6" "AA10" "AA4" "AA5"
unique(dplyr::mutate(data, Equal = eq(group1, group2, groups = "group"))$Equal)
#> [1] NA "AA6"
unique(dplyr::mutate(data, NotEqual = ne(group1, group2, groups = "group"))$NotEqual)
#> [1] "AA7" "AA9" "AA8" "AA3" NA "AA10" "AA1" "AA4" "AA5" "AA2"
# Compond expressions
unique(
dplyr::mutate(data,
Top3OrEqual = or(top(3), eq(group1, group2, groups = "group")))$Top3OrEqual
)
#> [1] NA "AA6" "AA10" "AA1" "AA2"
unique(
dplyr::mutate(data,
SharedAndGreater = and(
shared(group1, group2, groups = "group"),
gt(group1, group2, groups = "group")
))$SharedAndGreater
)
#> [1] NA "AA9" "AA8" "AA4" "AA5"