Skip to content

Change Log

Change Log

0.15.6

  • deps: bump simplug to 0.4, datar-numpy to 0.3.4 and datar-pandas to 0.5.5
  • tests: adopt pytest v8
  • ci: use latest actions

0.15.5

  • deps: bump datar-numpy to 0.3.3

0.15.4

  • docs: fix typo in README.md (#197)
  • docs: change filter to filter_ in README.md
  • docs: fix typo in data.md
  • deps: bump datar-pandas to 0.5.4 (support pandas 2.2+)

0.15.3

  • ⬆️ Bump pipda to 0.13.1

0.15.2

  • ⬆️ Bump datar-pandas to 0.5.2 to fix pip install datar[pandas] not having numpy backend installed.

0.15.1

  • ⬆️ Bump datar-pandas to 0.5.1
    • Dismiss ast warning for if_else.
    • Make scipy and wcwidth optional deps
    • Set seed in tests
    • Dismiss warnings of fillna with method for pandas2.1

0.15.0

  • ✨ Add data who2, household, cms_patient_experience, and cms_patient_care
  • ⬆️ Bump datar-pandas to 0.5 to support pandas2 (#186)

0.14.0

  • ⬆️ Bump pipda to 0.13
  • 🍱 Support dplyr up to 1.1.3
  • 👽️ Align rows_*() verbs to align with dplyr 1.1.3 (#188)
  • 🔧 Update pyproject.toml to generate setup.py for poetry

0.13.1

  • 🎨 Allow datar.all.filter regardless of allow_conflict_names (#184)

0.13.0

  • 👷 Add scripts for codesandbox
  • 💥 Change the option for conflict names (#184)

    There is no more warning for conflict names (python reserved names). By default, those names are suffixed with _ (ie filter_ instead of filter). You can still use the original names by setting allow_conflict_names to True in datar.options().

    from datar import options
    options(allow_conflict_names=True)
    from datar.all import *
    filter  # <function datar.dplyr.filter_ at 0x7f62303c8940>
    

0.12.2

  • ➕ Add pyarrow backend
  • 🐛 Exclude coverage for multiline version in get_versions()
  • ⬆️ Bump up python-simpleconf to 0.6 so datar can be installed in Windows (#180)

0.12.1

  • ⬆️ Bump datar-numpy to ^0.2

0.12.0

  • 📝 Added import f to plotnine on README.md (#177)
  • ⬆️ Drop support for python3.7
  • ⬆️ Bump pipda to 0.12
  • 🍱 Update storms data to 2020 (tidyverse/dplyr#5899)

0.11.2

  • 📝 Add pypi downloads badge to README
  • 📝 Fix github workflow badges for README
  • 🐛 Add return type annotation to fix #173
  • ⬆️ Bump python-slugify to v8

0.11.1

  • 🐛 Fix get_versions() not showing plugin versions
  • 🐛 Fix plugins not loaded when loading datasets
  • 🚸 Add github issue templates

0.11.0

  • 📝 Add testimonials and backend badges in README.md
  • 🐛 Load entrypoint plugins only when APIs are called (#162)
  • 💥 Rename other module to misc

0.10.3

  • ⬆️ Bump simplug to 0.2.2
  • ✨ Add apis.other.array_ufunc to support numpy ufuncs
  • 💥 Change hook data_api to load_dataset
  • ✨ Allow backend for c[]
  • ✨ Add DatarOperator.with_backend() to select backend for operators
  • ✅ Add tests
  • 📝 Update docs for backend supports

0.10.2

  • 🚑 Fix false warning when importing from all

0.10.1

  • Pump simplug to 0.2

0.10.0

  • Detach backend support, so that more backends can be supported easier in the future
  • numpy backend: https://github.com/pwwang/datar-numpy
  • pandas backend: https://github.com/pwwang/datar-pandas
  • Adopt pipda 0.10 so that functions can be pipeable (#148)
  • Support pandas 1.5+ (#148), but v1.5.0 excluded (see pandas-dev/pandas#48645)

0.9.1

  • Pump pipda to 0.8.0 (fixes #149)

0.9.0

Fixes

  • Fix weighted_mean not handling group variables with NaN values (#137)
  • Fix weighted_mean on NA raising error instead of returning NA (#139)
  • Fix pandas .groupby() used internally not inheriting sort, dropna and observed (#138, #142)
  • Fix mutate/summarise not counting references inside function as used for _keep "used"/"unused"
  • Fix metadata _datar of nested TibbleGrouped not frozen

Breaking changes

  • Refactor core.factory.func_factory() (#140)
  • Use base.c[...] for range short cut, instead of f[...]
  • Use tibble.fibble() when constructing Tibble inside a verb, instead of tibble.tibble()
  • Make n a keyword-only argument for base.ntile

Deprecation

  • Deprecate verb_factory, use register_verb from pipda instead
  • Deprecate base.data_context

Dependences

  • Adopt pipda v0.7.1
  • Remove varname dependency
  • Install pdtypes by default

0.8.6

  • 🐛 Fix weighted_mean not working for grouped data (#133)
  • ✅ Add tests for weighted_mean on grouped data
  • ⚡️ Optimize distinct on existing columns (#128)

0.8.5

  • 🐛 Fix columns missing after Join by same columns using mapping (#122)

0.8.4

  • ➖ Add optional deps to extras so they aren't installed by default
  • 🎨 Giva better message when optional packages not installed

0.8.3

  • ⬆️ Upgrade pipda to v0.6
  • ⬆️️ Upgrade thon-simpleconf to 5.5

0.8.2

  • ♻️ Move glimpse to dplyr (as glimpse is a tidyverse-dplyr API)
  • 🐛 Fix glimpse() output not rendering in qtconsole (#117)
  • 🐛 Fix base.match() for pandas 1.3.0
  • 🐛 Allow base.match() to work with grouping data (#115)
  • 📌 Use rtoml (python-simpleconf) instead of toml (See https://github.com/pwwang/toml-bench)
  • 📌 Update dependencies

0.8.1

  • 🐛 Fix month_abb and month_name being truncated (#112)
  • 🐛 Fix unite() not keeping other columns (#111)

0.8.0

  • ✨ Support base.glimpse() (#107, machow/siuba#409)
  • 🐛 Register base.factor() and accept grouped data (#108)
  • ✨ Allow configuration file to save default options
  • 💥 Replace option warn_builtin_names with imiport_names_conflict (#73)
  • 🩹 Attach original __module__ to func_factory registed functions
  • ⬆️ Bump pipda to 0.5.9

0.7.2

  • ✨ Allow tidyr.unite() to unite multiple columns into a list, instead of join them (#105)
  • 🩹 Typos in argument names of tidyr.pivot_longer() (#104)
  • 🐛 Fix base.sprintf() not working with Series (#102)

0.7.1

  • 🐛 Fix settingwithcopywarning in tidyr.pivot_wider()
  • 📌 Pin deps for docs
  • 💚 Don't upload coverage in PR
  • 📝 Fix typos in docs (#99, #100) (Thanks to @pdwaggoner)

0.7.0

  • ✨ Support modin as backend
  • ✨ Add _return argument for datar.options()
  • 🐛 Fix tidyr.expand() when nesting(f.name) as argument

0.6.4

Breaking changes

  • 🩹 Make base.ntile() labels 1-based (#92)

Fixes

  • 🐛 Fix order_by argument for dplyr.lead-lag

Enhancements

  • 🚑 Allow base.paste/paste0() to work with grouped data
  • 🩹 Change dtypes of base.letters/LETTERS/month_abb/month_name

Housekeeping

  • 📝 Update and fix reference maps
  • 📝 Add environment.yml for binder to work
  • 📝 Update styles for docs
  • 📝 Update styles for API doc in notebooks
  • 📝 Update README for new description about the project and add examples from StackOverflow

0.6.3

  • ✨ Allow base.c() to handle groupby data
  • 🚑 Allow base.diff() to work with groupby data
  • ✨ Allow forcats.fct_inorder() to work with groupby data
  • ✨ Allow base.rep()'s arguments length and each to work with grouped data
  • ✨ Allow base.c() to work with grouped data
  • ✨ Allow base.paste()/base.paste0() to work with grouped data
  • 🐛 Force &/| operators to return boolean data
  • 🚑 Fix base.diff() not keep empty groups
  • 🐛 Fix recycling non-ordered grouped data
  • 🩹 Fix dplyr.count()/tally()'s warning about the new name
  • 🚑 Make dplyr.n() return groupoed data
  • 🐛 Make dplyr.slice() work better with rows/indices from grouped data
  • 🩹 Make dplyr.ntile() labels 1-based
  • ✨ Add datar.attrgetter(), datar.pd_str(), datar.pd_cat() and datar.pd_dt()

0.6.2

  • 🚑 Fix #87 boolean operator losing index
  • 🚑 Fix false alarm from rename()/relocate() for missing grouping variables (#89)
  • ✨ Add base.diff()
  • 📝 [doc] Update/Fix doc for case_when (#87)
  • 📝 [doc] Fix links in reference map
  • 📝 [doc] Update docs for dplyr.base

0.6.1

  • 🐛 Fix rep(df, n) producing a nested df
  • 🐛 Fix TibbleGrouped.__getitem__() not keeping grouping structures

0.6.0

General

  • Adopt pipda 0.5.7
  • Reimplement the split-apply-combine rule to solve all performance issues
  • Drop support for pandas v1.2, require pandas v1.3+
  • Remove all base0_ options and all indices are now 0-based, except base.seq(), ranks and their variants
  • Remove messy type annotations for now, will add them back in the future
  • Move implementation of data type display for frames in terminal and notebook to pdtypes package
  • Change all arguments end with "_" to arguments start with it to avoid confusion
  • Move module datar.stats to datar.base.stats
  • Default all na_rm arguments to True
  • Rename all ptype arguments for tidyr verbs into dtypes

Details

  • Introduct new API to register function datar.core.factory.func_factory()
  • Aliase register_verb and register_func as verb_factory and context_func_factory in datar.core.factory
  • Expose options, options_context, add_option and get_option in datar/__init__.py and remove them from datar.base
  • Attach pipda.options to datar.options
  • Move head and tail from datar.utils to datar.base
  • Remove redundant unique implentation from datar.base.seq
  • Add datar.core.factory.func_factory() for developers to register function that works with different types of data (NDFrame, GropuBy, etc)
  • Not ensure NAs after NA for base.cumxxx() families any more
  • Remove set_names from datar.stats, use names(df, <new names>) from datar.base instead
  • Optimize intersect, union, setdiff, append from datar.base
  • Keep grouping variables for intersect, union, setdiff and union_all when y is a grouped df, even when x is not
  • Remove drop_index from datar.datar, use datar.tibble.remove_rownames/remove_index/drop_index instead
  • Add assert_tibble_equal() in datar.testing to test whether 2 tibbles are equal
  • rep() now works with frames
  • c_across() now returns a rowwise df to work with functions that apply to df on axis=1
  • datar.dplyr.order_by() now only works like it does in r-dplyr and only in side a verb
  • datar.dplyr.group_by() detauls _sort to False for speed
  • Only raise error for duplicated column names when selected by column name instead of index
  • base.scale() returns a series rather than a frame when works with a series
  • Other fixes and optimizations

0.5.6

  • 🐛 Hotfix for types registered for base.proportions (#77)
  • 👽️ Fix for pandas 1.4

0.5.5

  • Fix #71: semi_join returns duplicated rows

0.5.4

  • Fix filter() restructures group_data incorrectly (#69)

0.5.3

  • ⚡️ Optimize dplyr.arrange when data are series from the df itself
  • 🐛 Fix sub-df order of apply for grouped df (#63)
  • 📝 Update doc for argument by for join functions (#62)
  • 🐛 Fix mean() with option na_rm=False does not work (#65)

0.5.2

More of a maintenance release.

  • 🔧 Add metadata for datasets
  • 🔊 Send logs to stderr, instead of stdout
  • 📌Pin dependency versions
  • 🚨 Switch linter to flake8
  • 📝 Update some docs to fit datar-cli

0.5.1

  • Add documentation about "blind" environment (#45, #54, #55)
  • Change base.as_date() to return pandas datetime types instead python datetime types (#56)
  • Add base.as_pd_date() to be an alias of pandas.to_datetime() (#56)
  • Expose trimws to datar.all (#58)

0.5.0

Added:

  • Added forcats (#51 )
  • Added base.is_ordered(), base.nlevels(), base.ordered(), base.rank(), base.order(), base.sort(), base.tabulate(), base.append(), base.prop_table() and base.proportions()
  • Added gss_cat dataset

Fixed:

  • Fixed an issue when Collection dealing with numpy.int_

Enhanced:

  • Added base0_ argument for datar.get()
  • Passed __calling_env to registered functions/verbs when used internally (this makes sure the library to be robust in different environments)

0.4.4

  • Adopt varname v0.8.0
  • Add base.make_names() and base.make_unique()

0.4.3

  • Adopt pipda 0.4.5
  • Make dataset names case-insensitive;
  • Add datasets: ToothGrowth, economics, economics_long, faithful, faithfuld, luv_colours, midwest, mpg, msleep, presidential, seals, and txhousing
  • Add base.complete_cases()
  • Change datasets.all_datasets() to datasets.list_datasets()
  • Make sure assume_all_piping mode works internally: #45

0.4.2

  • Adopt pipda 0.4.4
  • Add varname to dependency to close #30
  • Rename datar.datar_versions to datar.get_versions
  • Port a set of functions from r-base, incluing:
    • prod, sign, signif, trunc, exp, log, log2, log10, log1p,
    • is_finite, is_infinite, is_nan,
    • match,
    • startswith, endswith, strtoi, chartr, tolower, toupper,
    • max_col

0.4.1

  • Don't use piping syntax internally (>>=)
  • Add python, numpy and datar version to datar.datar_versions()
  • Fix #40: anti_join/semi_join not working when by is column mapping

0.4.0

  • Adopt pipda v0.4.2

Performance improved: - Refactor core.grouped to adopt pandas's groupby - Try to use DataFrame.agg()/DataFrameGroupBy.agg() when function applied on a single columns (Related issues: #27, #33, #37)

Fixed: - Fix when data or context as new column name for mutate() - Fix SettingwithCopyWarning in pivot_longer - Use regular calling internally to make sure it works in some cases that node cannot be detected (ie Gooey/%%timeit in jupyter)

Added: - datar.datar_versions() to show versions of related packages for bug reporting.

0.3.2

  • Adopt pipda v0.4.1 to fix getattr() failure for operater-connected expressions (#38)
  • Add str_dtype argument to as_character() to partially fix #36
  • Update license in core._frame_format_patch (#28)

0.3.1

  • Adopt pipda v0.4.0
  • Change argument _dtypes to dtypes_ for tibble-families

0.3.0

  • Adopt pipda v0.3.0

Breaking changes:

  • Rename argument dtypes of unchop and unnest back to ptype
  • Change all _base0 to base0_
  • Change argument how of tidyr.drop_na to how_

0.2.3

  • Fix compatibility with pandas v1.2.0~4 (#20, thanks to @antonio-yu)
  • Fix base.table when inputs are factors and exclude is NA;
  • Add base.scale/col_sums/row_sums/col_means/row_means/col_sds/row_sds/col_medians/row_medians

0.2.2

  • Use a better strategy warning for builtin name overriding.
  • Fix index of subdf not dropped for mutate on grouped data
  • Fix names_glue not working with single values_from for tidyr.pivot_wider
  • Fix base.paste not registered
  • Fix base.grep/grepl on NA values
  • Make base.sub/gsub return scalar when inputs are scalar strings

0.2.1

  • Use observed values for non-observsed value match for group_data instead of NAs, which might change the dtype.
  • Fix tibble recycling values too early

0.2.0

Added: - Add base.which, base.bessel, base.special, base.trig_hb and base.string modules - Add Support for duplicated keyword arguments for dplyr.mutate/summarise by using _ as suffix - Warn when import python builtin names directly; ; Remove modkit dependency

Fixed: - Fixed errors when usea_1 as names for "check_unique" name repairs - Fixed #14: f.a.mean() not applied to grouped data

Changed: - Don't allow from datar.datasets import * - Remove modkit dependency - Reset NaN to NA - Rename base.getOption to base.get_option - Rename stats.setNames to stats.set_names

0.1.1

  • Adopt pipda 0.2.8
  • Allow f.col1[f.col2==max(f.col2)] like expression
  • Add base.which/cov/var
  • Fix base.max
  • Add datasets.ChickWeight
  • Allow dplyr.across to have plain functions passed with default EVAL context.

0.1.0

Added: - pandas.NA as NaN - Dtypes display when printing a dataframe (string, html, notebook) - zibble to construct dataframes with names specified together, and values together.

Fixed: - base.diag() on dataframes - Data recycling when length is different from original data - datar.itemgetter() not public

Changed: - Behavior of group_by() with _drop=False. Invisible values will not mix with visible values of other columns

0.0.7

  • Add dplyr rows verbs
  • Allow mixed numbering (with c() and f[...]) for tibble construction
  • Allow slice (f[a:b]) to be expanded into sequence for EVAL context
  • Finish tidyr porting.

0.0.6

  • Add options, get_option and options_context to datar.base to allow set/get global options
  • Add options: dplyr.summarise.inform
  • Add base0_ argument to all related APIs
  • Add nycflights13 datasets
  • Support slice_head/slice_tail for grouped data

0.0.5

  • Add option index.base.0;
  • Refactor Collection families

0.0.4

  • Finish port of tibble.

0.0.3

  • Add stats.weighted_mean
  • Allow function to prefer recycling input or output for summarise

0.0.2

  • Port verbs and functions from tidyverse/dplyr and test them with original cases