Introduction
Simply pass your data in either a tibble or a lazy DBI object to the time intelligence function and input the required arguments.
We will use mtd() function to calculate the month-to-date sum of Contoso’s margin from their sales dataset (contoso::sales).
Most time intelligence functions follow the same structure:
- Specify the date column to index the time intelligence functions (eg.
order_date, deliver_date, etc)
- Specify the value column to aggregate (eg.
margin, net_margin, cogs, etc)
- If there is a period rollback / rollforward then clarify the number of periods to roll
- Clarify if we are using a
"standard" calendar or a retail calendar variation ("445", "454", or "544") – see the standard vs. non-standard calendar article for more information
- If using a retail calendar, optionally specify
fiscal_year_start (1–12) to control which month the fiscal year begins nearest to (default is January)
When you execute mtd(), your console will return a ti object displaying a custom print message:
- A summary of the function’s actions
- Details the calendar’s attributes used to support the calculation
- Describes the main transformation steps and columns that are referenced
- Lists out possible next actions
contoso::sales |>
ti::mtd(.date=order_date,.value = margin,calendar_type = "standard")
── Month-to-date ───────────────────────────────────────────────────────────────
Function: `mtd` was executed
── Description: ──
This creates a daily `cumsum()` of the current month margin from the start of
the standard calendar month to the end of the month
── Calendar: ──
• The calendar aggregated order_date to the day time unit
• A standard calendar is created with 0 groups
• Calendar ranges from 2021-05-18 to 2024-04-20
• 222 days were missing and replaced with 0
• New date column date, year, quarter, month was created from order_date
── Actions: ──
✔Aggregate margin
✖Shift
✖Compare
✖Proportion of Total
✖Count Distinct
── Next Steps: ──
• Use `calculate()` to return the results
────────────────────────────────────────────────────────────────────────────────
To return the results to a lazy DBI object, pass the ti object through to calculate().
contoso::sales |>
mtd(.date=order_date,.value = margin,calendar_type = "standard") |>
calculate()
If you’re using a tibble, under the hood, ti is converting your data to a duckdb database
If your data is in a database, the package will leverage dbplyr to execute all the calculations
Either case use dplyr::collect() to return your results to a local tibble
# A tibble: 10 × 9
date year month quarter week day margin mtd_margin
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-05-18 2021 5 2 20 18 407. 407.
2 2021-05-19 2021 5 2 20 19 711. 1118.
3 2021-05-20 2021 5 2 20 20 1424. 2542.
4 2021-05-21 2021 5 2 21 21 11339. 13881.
5 2021-05-22 2021 5 2 21 22 5359. 19240.
6 2021-05-23 2021 5 2 21 23 0 19240.
7 2021-05-24 2021 5 2 21 24 0 19240.
8 2021-05-25 2021 5 2 21 25 793. 20033.
9 2021-05-26 2021 5 2 21 26 74.6 20107.
10 2021-05-27 2021 5 2 21 27 1433. 21540.
# ℹ 1 more variable: missing_date_indicator <dbl>
What if you need the analysis at the group level?
Simply pass the groups that you want to dplyr::group_by() and time intelligence function will create a custom calendar for each group level.
This ensures that each group will have a complete calendar ensuring no group member has any missing dates.
contoso::sales |>
dplyr::group_by(customer_key,store_key) |>
yoy(.date=order_date,.value = margin,calendar_type = "standard")
Why do we need this package when we have lubridate?
Lubridate is an excellent package and is at the core of many of ti’s functions. The issue isn’t lubridate but rather the challenges and issues in your dataset that typically don’t allow you to directly use lubridate.
The advantage of this package is that it will perform all of the required pre-processing steps for you.
Issue 1: Many datasets do not have continuous dates, especially if data is recorded only on business days or for active transactions
Issue 2: Period imbalances between periods (Eg. the different number of days between February vs. January) can create misleading analysis/trends or your analysis requires non-standard calendar types
Issue 3: Calculating time intelligence for groups can lead to larger than memory issues even with smaller datasets
Issue 1: Continuous Dates
Referencing the Table 1 below, if we were to use dplyr::lag() to compare Day-over-Day (DoD) margin, we would be missing 2024-01-02, 2024-01-04, and 2024-01-05 which would lead to incorrect answers or trends.
To correct this, ti will automatically complete your calendar for each group for the missing periods to ensure there are no missing periods when calculating trends.
In Table 2 we see a complete calendar set and a new column, “missing_date_indicator” to indicate how many dates were missing.
Issue 2: Period imbalances
When comparing two performance periods with a standard calendar, you can often compare periods with unequal number of days or periods.
For example if you want to compare January sales to February you can get misleading conclusions due to the unequal number of weekends and days in those periods.
In practice you have two choices:
compare periods with similar days (eg. the 28th of February should only compare up to the 28th of January) and you omit three days of January sales altogether
compare have an imbalanced comparison (eg. the 28th of February compares to the 31st of January so that no days are lost).
This package does the second option to ensure we don’t lose any of January’s sales but to help flag for imbalance, ti will add a column to let you know how many periods (eg. days) are in your comparison period to increase transparency to this dynamic.
To create this example, we will use the pmtd() function to calculate the prior month to date cumulative margin in the current month.
contoso::sales |>
pmtd(order_date,margin,"standard",lag_n = 1)
When we pass the ti object through to calculate() and filter the results for February 2022, we would see the below Table 3.
On 2022-02-28, we see that is comparing 31 days of the previous period to the 28 days in the current period.
Issue 3: Larger than memory
If your data isn’t already in a database then ti will leverage duckdb to convert your data to enable larger than memory calculation.
This is necessary for time intelligence functions because when you have grouped data, you need to complete calendar for each group combination. For even modest datasets, this can quickly multiply and grow your data to be larger than memory.
If your data is already in database, then ti will use dbplyr and will convert the functions to SQL to write the queries.
In either scenario, you can use dplyr::collect() to execute your SQL query and return a tibble to your local computer.
# quickly load contoso package to a database via the contoso package
db <- contoso::create_contoso_duckdb()
# pass through to the same function mtd()
db$sales |> mtd(order_date,margin,"standard")
# same syntax even though data source is now a tibble
contoso::sales |> mtd(order_date,margin,"standard")