Getting started

How to use time-intelligence functions

Author

Alejandro Hagan

Introduction

Simply pass your data in either a tibble or a lazy DBI object to the time intelligence function and input the required arguments.

We will use mtd() function to calculate the month-to-date sum of Contoso’s margin from their sales dataset (contoso::sales).

Most time intelligence functions follow the same structure:

  • Specify the date column to index the time intelligence functions (e.g., order_date, deliver_date, etc.)
  • Specify the value column to aggregate (e.g., margin, net_margin, cogs, etc.)
  • If there is a period rollback / rollforward then clarify the number of periods to roll
  • Clarify if we are using a "standard" calendar or a retail calendar variation ("445", "454", or "544") – see the standard vs. non-standard calendar article for more information
  • If using a retail calendar, optionally specify fiscal_year_start (1–12) to control which month the fiscal year begins nearest to (default is January)

When you execute mtd(), your console will return a ti object displaying a custom print message:

  • A summary of the function’s actions
  • Details the calendar’s attributes used to support the calculation
  • Describes the main transformation steps and columns that are referenced
  • Lists out possible next actions
contoso::sales |> 
   ti::mtd(.date=order_date,.value = margin,calendar_type = "standard")
── Month-to-date ───────────────────────────────────────────────────────────────
Function: `mtd` was executed
── Description: ──
This creates a daily `cumsum()` of the current month margin from the start of
the standard calendar month to the end of the month
── Calendar: ──
• The calendar aggregated order_date to the day time unit
• A standard calendar is created with 0 groups
• Calendar ranges from 2021-05-18 to 2024-04-20222 days were missing and replaced with 0
• New date column date, year, quarter, month was created from order_date
── Actions: ──
Aggregate margin
Shift
Compare
Proportion of Total
Count Distinct
── Next Steps: ──
• Use `calculate()` to return the results
────────────────────────────────────────────────────────────────────────────────

To return the results to a lazy DBI object, pass the ti object through to calculate().

1contoso::sales |>
2   mtd(.date=order_date,.value = margin,calendar_type = "standard") |>
3   calculate()
1
Pass through your data source
2
Select the ti function, arguments and calendar_type
3
Use ‘calculate’ to return a lazy_tbl of results instead of a print statement

If you’re using a tibble, under the hood, ti converts your data to a DuckDB database.

If your data is in a database, the package will leverage dbplyr to execute all the calculations.

In either case, use dplyr::collect() to return your results to a local tibble.

contoso::sales |>                                                                
   mtd(.date=order_date,.value = margin,calendar_type = "standard") |>  
   calculate() |>                                                       
1   dplyr::collect()
1
convert any lazy_tbl object into a tibble with ‘dplyr::collect()’
date year month quarter week day margin mtd_margin missing_date_indicator
2021-05-18 2021 5 2 20 18 406.840 406.840 0
2021-05-19 2021 5 2 20 19 711.351 1118.191 0
2021-05-20 2021 5 2 20 20 1424.101 2542.292 0
2021-05-21 2021 5 2 21 21 11338.631 13880.923 0
2021-05-22 2021 5 2 21 22 5358.767 19239.690 0
2021-05-23 2021 5 2 21 23 0.000 19239.690 1
2021-05-24 2021 5 2 21 24 0.000 19239.690 1
2021-05-25 2021 5 2 21 25 792.933 20032.623 0
2021-05-26 2021 5 2 21 26 74.550 20107.173 0
2021-05-27 2021 5 2 21 27 1432.710 21539.883 0

What if you need the analysis at the group level?

Simply pass the groups that you want to dplyr::group_by() and time intelligence function will create a custom calendar for each group level.

This ensures that each group will have a complete calendar ensuring no group member has any missing dates.

contoso::sales |>   
1   dplyr::group_by(customer_key,store_key) |>
   yoy(.date=order_date,.value = margin,calendar_type = "standard") 
1
Select the dimensions you want to group and pass a grouped column to your ti function

Why do we need this package when we have lubridate?

Lubridate is an excellent package and is at the core of many of ti’s functions. The issue is not with lubridate itself, but rather the challenges in your dataset that prevent you from directly using lubridate functions.

The advantage of this package is that it will perform all of the required pre-processing steps for you.

  • Issue 1: Many datasets do not have continuous dates, especially if data is recorded only on business days or for active transactions

  • Issue 2: Imbalances between periods (e.g., the different number of days between February vs. January) can create misleading analysis or trends, or your analysis may require non-standard calendar types

  • Issue 3: Calculating time intelligence for groups can lead to larger-than-memory issues even with smaller datasets

Issue 1: Continuous Dates

Referencing the Table 1 below, if we were to use dplyr::lag() to compare Day-over-Day (DoD) margin, we would be missing 2024-01-02, 2024-01-04, and 2024-01-05 which would lead to incorrect answers or trends.

Table 1: Incomplete calendar table can lead to wrong conclusions or trends
Incomplete Calendar Data
Missing dates can lead to wrong conclusions
date margin
2024-01-01 1,200
2024-01-03 1,100
2024-01-06 1,300
2024-01-07 900
2024-01-08 1,200
2024-01-09 850
2024-01-11 1,450

To correct this, ti will automatically complete your calendar for each group for the missing periods to ensure there are no missing periods when calculating trends.

In Table 2 we see a complete calendar set and a new column, “missing_date_indicator” to indicate how many dates were missing.

Table 2: Original table now complete with missing dates and a missing date indicator
Complete Calendar with Missing Date Indicator
ti automatically fills missing dates
date year quarter month week day margin dod_margin missing_date_indicator
2024-01-01 2024 1 1 1 1 1,200 NA 0
2024-01-02 2024 1 1 1 2 0 1,200 1
2024-01-03 2024 1 1 1 3 1,100 0 0
2024-01-04 2024 1 1 1 4 0 1,100 1
2024-01-05 2024 1 1 1 5 0 0 1
2024-01-06 2024 1 1 1 6 1,300 0 0
2024-01-07 2024 1 1 1 7 900 1,300 0
2024-01-08 2024 1 1 2 8 1,200 900 0
2024-01-09 2024 1 1 2 9 850 1,200 0
2024-01-10 2024 1 1 2 10 0 850 1
2024-01-11 2024 1 1 2 11 1,450 0 0

Issue 2: Period imbalances

When comparing two performance periods with a standard calendar, you may end up comparing periods with an unequal number of days.

For example, if you want to compare January sales to February, you can get misleading conclusions due to the unequal number of weekends and days in those periods.

In practice you have two choices:

  • Compare periods with similar days (e.g., the 28th of February should only compare up to the 28th of January) and you omit three days of January sales altogether

  • Have an imbalanced comparison (e.g., the 28th of February compares to the 31st of January so that no days are lost)

This package uses the second option to ensure we don’t lose any of January’s sales, but to help flag the imbalance, ti will add a column to let you know how many periods (e.g., days) are in your comparison period to increase transparency.

To create this example, we will use the pmtd() function to calculate the prior month to date cumulative margin in the current month.

contoso::sales |>
   pmtd(order_date,margin,"standard",lag_n = 1)

When we pass the ti object through to calculate() and filter the results for February 2022, we would see the below Table 3.

On 2022-02-28, we see that is comparing 31 days of the previous period to the 28 days in the current period.

Table 3
Period Imbalance Example
February 2022 comparing to January (31 vs 28 days)
date year month pmtd_margin days_in_comparison_period
2022-02-26 2022 2 129,436 26
2022-02-27 2022 2 132,202 27
2022-02-28 2022 2 142,668 31

Issue 3: Larger-than-memory data

If your data isn’t already in a database, ti will leverage DuckDB to convert your data and enable larger-than-memory calculations.

This is necessary for time intelligence functions because, with grouped data, you need to complete the calendar for each group combination. For even modest datasets, this can quickly multiply and grow your data beyond available memory.

If your data is already in a database, ti will use dbplyr to convert the functions to SQL queries.

In either scenario, you can use dplyr::collect() to execute your SQL query and return a tibble to your local computer.

# quickly load contoso package to a database via the contoso package
db <- contoso::create_contoso_duckdb()

# pass through to the same function mtd()
db$sales |> mtd(order_date,margin,"standard")

# same syntax even though data source is now a tibble

contoso::sales |> mtd(order_date,margin,"standard")