Getting Started with tidychain

Installation

From Codeberg (development)

remotes::install_git("https://codeberg.org/usrbinr/tidychain")

From GitHub (mirror)

pak::pak("usrbinr/tidychain")

Codeberg is the primary repository. GitHub is a read-only mirror.

Your First Analysis

The main entry point is parse_xlsx(), which returns an S7 xlsx_workbook object containing everything tidychain extracts from the file:

library(tidychain)

xlsx <- parse_xlsx("nist_financial_model.xlsx")
xlsx

── Excel Workbook Summary ──────────────────────────────────────────────────────
File: 'nist_financial_model.xlsx' (217.5 KB)
Created: "2000-12-08T14:09:03Z" by "(undefined Excel user)"
Modified: "2023-03-24T15:43:31Z" by "Parcell, John H. (Ctr)"

Sheets: 13 (1 hidden)
Cells: 19505 | Formulas: 4824

── Forensic Flags ──

! 1 hidden sheet(s)
! Hidden rows/columns detected
! 3 external link(s)
! 2847 late addition(s) in calc chain

Use `summary(xlsx@meta)`, `summary(xlsx@chain)`, etc. for details

The summary provides an immediate forensic overview: file metadata, structure counts, and automatic detection of suspicious patterns.

The xlsx_workbook Object

Every component is accessible via @ slots:

1xlsx@meta@creator
xlsx@meta@modified
2xlsx@sheets@sheets
3xlsx@chain@chain
4xlsx@cells@cells
5xlsx@formulas@formulas

1: File metadata: author, creation/modification dates
2: Sheet information: names, visibility state
3: Calculation chain: formula creation order (key for forensics)
4: All cell data: values, formulas, types
5: Parsed formulas with their cell references

Component Summaries

Each slot has its own summary() method:

summary(xlsx@meta)

── Metadata Summary ──
Creator: "(undefined Excel user)"
Last modified by: "Parcell, John H. (Ctr)"
Created: "2000-12-08T14:09:03Z"
Modified: "2023-03-24T15:43:31Z"
Application: "Microsoft Excel Online" "16.0300"

summary(xlsx@chain)

── Calc Chain Summary ──
Formula cells tracked: 6142
Sheets covered: 7
! Late additions: 2847 (46.4%)

46% Late Additions?

This doesn’t necessarily mean fraud. This file was created in 2000 and edited for 23 years. High anomaly counts are normal for working documents with long editing histories.

Available Slots

Core Slots

Slot	Description	Access Example
`@meta`	Author, dates, app info	`xlsx@meta@creator`
`@sheets`	Sheet names, hidden status	`xlsx@sheets@sheets`
`@chain`	Formula creation order	`xlsx@chain@chain`
`@cells`	All cell data	`xlsx@cells@cells`
`@formulas`	Formulas with references	`xlsx@formulas@formulas`
`@strings`	Shared strings	`xlsx@strings@strings`
`@styles`	Hidden rows/columns	`xlsx@styles@hidden_rows`
`@comments`	Cell comments	`xlsx@comments@comments`
`@links`	External links	`xlsx@links@links`
`@connections`	Database connections	`xlsx@connections@connections`

Extended Slots

Slot	Description
`@tables`	Named table definitions
`@pivots`	Pivot table definitions
`@charts`	Chart configurations
`@queries`	Query table definitions
`@slicers`	Slicer definitions
`@custom`	Custom XML parts

Extracting Formulas

Use extract_formulas() to get all formulas with their cell references:

formulas <- extract_formulas("simple_model.xlsx")
formulas

# A tibble: 3 × 6
  sheet cell      cell_local formula references has_references
  <chr> <chr>     <chr>      <chr>   <list>     <lgl>
1 Model Model!B7  B7         B2*B3   <chr [2]>  TRUE
2 Model Model!B8  B8         B7*B4   <chr [2]>  TRUE
3 Model Model!B11 B11        B7+B8   <chr [2]>  TRUE

The references column contains a list of all cells that each formula depends on—useful for building dependency graphs.

Parsing Individual Formulas

Extract cell references from any formula string:

parse_formula_refs("=SUM(A1:A10)+B5*C3")

[1] "A1" "A10" "B5" "C3"

parse_formula_refs("=VLOOKUP(A1,Sheet2!B:C,2,FALSE)")

[1] "A1"

Expanding Ranges

Convert range notation to individual cells:

expand_range("A1:C3")

[1] "A1" "B1" "C1" "A2" "B2" "C2" "A3" "B3" "C3"

Getting All Cells

Retrieve every cell with its type:

cells <- get_all_cells("simple_model.xlsx")
cells

# A tibble: 15 × 8
   sheet cell      cell_local   row col   value formula cell_type
   <chr> <chr>     <chr>      <int> <chr> <chr> <chr>   <chr>
 1 Model Model!A1  A1             1 A     0     <NA>    value
 2 Model Model!A2  A2             2 A     1     <NA>    value
 3 Model Model!B2  B2             2 B     100   <NA>    value
 4 Model Model!B7  B7             7 B     <NA>  B2*B3   formula
 ...

table(cells$cell_type)

formula   value
      3      12

Security Functions

Check for Encryption

Encrypted Excel files cannot be parsed:

is_encrypted_xlsx("protected_file.xlsx")

[1] TRUE

Encrypted Files

If is_encrypted_xlsx() returns TRUE, the file uses AES-256 encryption and cannot be opened without the password. This is different from sheet protection, which is merely cosmetic.

Sheet Protection

Sheet protection is easily removed (it’s just XML flags):

# Check which sheets are protected
check_sheet_protection(xlsx)

# A tibble: 3 × 2
  sheet_name    protected
  <chr>         <lgl>
1 Sheet1        TRUE
2 Sheet2        FALSE
3 HiddenSheet   TRUE

# Remove protection (creates a new file)
remove_sheet_protection("protected.xlsx", output = "unprotected.xlsx")

Sheet Protection is Weak

Sheet protection only prevents casual editing. It does not encrypt data and can be removed by anyone with basic tools. Never rely on sheet protection for security.

Next Steps

Forensic Analysis — Learn to detect spreadsheet manipulation
Understanding Anomalies — How the detection algorithm works
Excel Formula Translation — Convert Excel formulas to R