Transformations

Introduction

Manipulation Sets

Manipulation sets are similar to chained “data manipulation verb” functions from the dplyr R package. They are ordered sets of instructions that are applied to the parent dataset one at a time (i.e function composition).

Manipulation Types

Manipulation types correspond to common data manipulation tasks.

Filter

The filter manipulation subsets data based on the provided conditions.

_images/filter.png
filter(conditions)

Return only rows meeting given conditions

Parameters:conditions – One or more conditional Manipulation Expressions separated by commas.

Example:

_images/filter_ex_1.png

Select

The select manipulation subsets the dataset, keeping only those columns given.

_images/select.png
select(columns)

Keeps only the selected columns.

Parameters:columns – Comma separated list of column names.

Example:

_images/select_ex_1.png

Create

The create manipulation allows new columns to be defined, or existing columns to be altered.

_images/create.png
create(column_name, column_definition)

Creates a new column or alters/replaces existing column.

Parameters:
  • column_name – The name of the column to create/alter.
  • column_definition – Expression definition column...

Example:

_images/create_ex_1.png

Rename

The rename manipulation renames the given column.

_images/rename.png
rename(old_column_name, new_column_name)
Parameters:
  • old_column_name – The name of the existing column.
  • new_column_name – New name of existing column.

Example:

_images/rename_ex_1.png

Slice

The rename manipulation renames the given column.

_images/rename.png
rename(old_column_name, new_column_name)
Parameters:
  • old_column_name – The name of the existing column.
  • new_column_name – New name of existing column.

Example:

_images/rename_ex_1.png

Group By

The rename manipulation renames the given column.

_images/rename.png
rename(old_column_name, new_column_name)
Parameters:
  • old_column_name – The name of the existing column.
  • new_column_name – New name of existing column.

Example:

_images/rename_ex_1.png

Join

The rename manipulation renames the given column.

_images/rename.png
rename(old_column_name, new_column_name)
Parameters:
  • old_column_name – The name of the existing column.
  • new_column_name – New name of existing column.

Example:

_images/rename_ex_1.png

Sort By

The sort by manipulation sorts the dataset based on given columns. A minus (-) in front of a column name indicates that the sort on that column should be descending.

_images/sort.png
sort_by(columns)
Parameters:columns – A comma separated list of column names.

Example:

_images/sort_ex_1.png

Wide to Long

The sort by manipulation sorts the dataset based on given columns. A minus (-) in front of a column name indicates that the sort on that column should be descending.

_images/sort.png
sort_by(columns)
Parameters:columns – A comma separated list of column names.

Example:

_images/sort_ex_1.png

SQL Queries

...