Get top entities from a data frame based on the number of entities in each group
mutate-helper-3.Rd
Get top entities from a data frame based on the number of entities in each group
Usage
top(
df = ".",
id,
n = 10,
compare = ".n",
subset = NULL,
with_ties = FALSE,
split_by = NULL,
return_type = c("uids", "ids", "subdf", "df", "interdf")
)
Arguments
- df
The data frame. Use
.
if the function is called in a dplyr pipe.- id
The column name in
df
for the groups.- n
The number of top entities to return. if
n
< 1, it will be regarded as the percentage of the total number of entities in each group (after subsetting or each applied). Specify 0 to return all entities.- compare
The column name in
df
to compare the values for each group. It could be either a numeric column or.n
to compare the number of entities in each group. If a column is passed, the values in the column must be numeric and the same in each group. This won't be checked.- subset
An expression to subset the entities, will be passed to
dplyr::filter()
. Default isTRUE
(no filtering).- with_ties
Whether to return all entities with the same size as the last entity in the top list. Default is
FALSE
.- split_by
A column name (without quotes) in metadata to split the cells.
- return_type
The type of the returned value. Default is
uids
. It could be one ofuids
: return the unique ids of the selected entitiesids
: return the ids of all entities in the same order as indf
, where the non-selected ids will beNA
subdf
: return a subset ofdf
with the selected entitiesdf
: return the originaldf
with a new logical column.out
to mark the selected entitiesinterdf
: return the intermediate data frame with the id column,<compare>
,predicate
and the split_by column if provided.
Examples
df <- data.frame(
id = c("A", "B", "C", "D", "E", "F", "G", "H"),
value = c(10, 20, 30, 40, 50, 60, 80, 80)
)
top(df, id, n = 1, compare = value, with_ties = TRUE, return_type = "uids")
#> [1] "G" "H"
top(df, "id", n = 2, compare = "value", return_type = "subdf")
#> id value
#> 1 G 80
#> 2 H 80
top(df, "id", n = 2, compare = "value", return_type = "df")
#> id value .selected
#> 1 A 10 FALSE
#> 2 B 20 FALSE
#> 3 C 30 FALSE
#> 4 D 40 FALSE
#> 5 E 50 FALSE
#> 6 F 60 FALSE
#> 7 G 80 TRUE
#> 8 H 80 TRUE
top(df, "id", n = 2, compare = "value", return_type = "interdf")
#> id value predicate
#> 1 A 10 FALSE
#> 2 B 20 FALSE
#> 3 C 30 FALSE
#> 4 D 40 FALSE
#> 5 E 50 FALSE
#> 6 F 60 FALSE
#> 7 G 80 TRUE
#> 8 H 80 TRUE
top(df, id, n = 0.25, compare = value, return_type = "uids")
#> [1] "G" "H"
top(df, id, n = 0, compare = value, return_type = "uids")
#> [1] "A" "B" "C" "D" "E" "F" "G" "H"
df <- data.frame(id = c("A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D"))
top(df, id, n = 2, compare = ".n", return_type = "uids", with_ties = TRUE)
#> [1] "B" "C" "D"
dplyr::mutate(df, selected = top(id = id, n = 2, compare = ".n", return_type = "ids",
with_ties = TRUE))
#> id selected
#> 1 A <NA>
#> 2 A <NA>
#> 3 B B
#> 4 B B
#> 5 B B
#> 6 C C
#> 7 C C
#> 8 C C
#> 9 D D
#> 10 D D
#> 11 D D
#> 12 D D