chop

In [1]:

Copied!

%run nb_helpers.py
from datar.all import *

nb_header(chop, unchop)
%run nb_helpers.py
from datar.all import *

nb_header(chop, unchop)

Try this notebook on binder.

★ chop
¶

Makes data frame shorter by converting rows within each group¶

into list-columns.

Args:¶

data: A data frame
cols: Columns to chop

Returns:¶

Data frame with selected columns chopped

★ unchop
¶

Makes df longer by expanding list-columns so that each element¶

of the list-column gets its own row in the output.

See https://tidyr.tidyverse.org/reference/chop.html

Recycling size-1 elements might be different from tidyr
>>> df = tibble(x=[1, [2,3]], y=[[2,3], 1])
>>> df >> unchop([f.x, f.y])
>>> # tibble(x=[1,2,3], y=[2,3,1])
>>> # instead of following in tidyr
>>> # tibble(x=[1,1,2,3], y=[2,3,1,1])

Args:¶

data: A data frame.
cols: Columns to unchop.
keep_empty: By default, you get one row of output for each element
of the list your unchopping/unnesting.
This means that if there's a size-0 element
(like NULL or an empty data frame), that entire row will be
dropped from the output.
If you want to preserve all rows, use keep_empty = True to
replace size-0 elements with a single row of missing values.

dtypes: Providing the dtypes for the output columns.
Could be a single dtype, which will be applied to all columns, or
a dictionary of dtypes with keys for the columns and values the
dtypes.
For nested data frames, we need to specify col$a as key. If col
is used as key, all columns of the nested data frames will be casted
into that dtype.

Returns:¶

A data frame with selected columns unchopped.

In [2]:

Copied!

df = tibble(x = c(1, 1, 1, 2, 2, 3), y = c[1:6:1], z = c[6:1:-1])
df >> nest(data = c(f.y, f.z))
df = tibble(x = c(1, 1, 1, 2, 2, 3), y = c[1:6:1], z = c[6:1:-1])
df >> nest(data = c(f.y, f.z))

Out[2]:

	x	data
	<int64>	<object>
0	1	<DF 3x2>
1	2	<DF 2x2>
2	3	<DF 1x2>

In [3]:

Copied!

df >> chop(c(f.y, f.z))
df >> chop(c(f.y, f.z))

Out[3]:

	x	y	z
	<int64>	<object>	<object>
0	1	[1, 2, 3]	[6, 5, 4]
1	2	[4, 5]	[3, 2]
2	3	[6]	[1]

In [4]:

Copied!

# Unchop
df = tibble(x = c[1:5], y = [[], [1], [1,2], [1,2,3]])
df >> unchop(f.y)
# Unchop
df = tibble(x = c[1:5], y = [[], [1], [1,2], [1,2,3]])
df >> unchop(f.y)

Out[4]:

	x	y
	<int64>	<object>
0	2	1.0
1	3	1.0
2	3	2.0
3	4	1.0
4	4	2.0
5	4	3.0

In [5]:

Copied!

df >> unchop(f.y, keep_empty=True, dtypes=int)
df >> unchop(f.y, keep_empty=True, dtypes=int)

Out[5]:

	x	y
	<int64>	<int64>
0	2	1
1	3	1
2	3	2
3	4	1
4	4	2
5	4	3

In [6]:

Copied!

df = tibble(x = c[1:2], y = ["a", [1,2,3]])
df >> unchop(f.y)
df = tibble(x = c[1:2], y = ["a", [1,2,3]])
df >> unchop(f.y)

Out[6]:

	x	y
	<int64>	<object>
0	1	a
1	1	1
2	1	2
3	1	3

In [7]:

Copied!

with try_catch():
    df >> unchop(f.y, dtypes=int)
with try_catch():
    df >> unchop(f.y, dtypes=int)

[ValueError] invalid literal for int() with base 10: 'a'

In [8]:

Copied!

df = tibble(x = c[1:4], y = [NULL, tibble(x = 1), tibble(y = c[1:3])])
df >> unchop(f.y)
df = tibble(x = c[1:4], y = [NULL, tibble(x = 1), tibble(y = c[1:3])])
df >> unchop(f.y)

Out[8]:

	x	y$x	y$y
	<int64>	<float64>	<float64>
0	2	1.0	NaN
1	3	NaN	1.0
2	3	NaN	2.0

In [9]:

Copied!

df >> unchop(f.y, keep_empty=True)
df >> unchop(f.y, keep_empty=True)

Out[9]:

	x	y$x	y$y
	<int64>	<float64>	<float64>
0	1	NaN	NaN
1	2	1.0	NaN
2	3	NaN	1.0
3	3	NaN	2.0

chop

★ chop¶

Makes data frame shorter by converting rows within each group¶

Args:¶

Returns:¶

★ unchop¶

Makes df longer by expanding list-columns so that each element¶

Args:¶

Returns:¶

★ chop
¶

★ unchop
¶