Excel Formula Translation
Overview
tidychain can extract formulas from Excel files and translate them to R code. This helps you understand Excel models and port them to R.
Formula translation covers direct 1:1 mappings only. Basic arithmetic, aggregation functions (SUM, AVERAGE, MIN, MAX), and logical functions (IF, AND, OR) translate automatically. Functions that require syntax changes or data structure context (VLOOKUP, POWER, MOD) are not translated.
Extracting Formulas
Use extract_formulas() to get all formulas with their dependencies:
library(tidychain)
formulas <- extract_formulas("simple_model.xlsx")
formulas# A tibble: 3 × 6
sheet cell cell_local formula references has_references
<chr> <chr> <chr> <chr> <list> <lgl>
1 Model Model!B7 B7 B2*B3 <chr [2]> TRUE
2 Model Model!B8 B8 B7*B4 <chr [2]> TRUE
3 Model Model!B11 B11 B7+B8 <chr [2]> TRUE
Each formula’s cell references are parsed into a list, revealing the dependency structure.
Parsing Formula References
Extract cell references from any formula string:
parse_formula_refs("=SUM(A1:A10)+B5*C3")[1] "A1" "A10" "B5" "C3"
parse_formula_refs("=VLOOKUP(A1,Sheet2!B:C,2,FALSE)")[1] "A1"
parse_formula_refs("=IF(AND(A1>0,B1<10),A1*2,0)")[1] "A1" "B1"
Extracts single cell references (A1, $C$3), range endpoints (A1:A10 → A1, A10), and cross-sheet references (sheet prefix stripped). Named ranges are not currently resolved.
Expanding Ranges
Convert range notation to individual cells:
expand_range("A1:C3")[1] "A1" "B1" "C1" "A2" "B2" "C2" "A3" "B3" "C3"
expand_range("B2:B5")[1] "B2" "B3" "B4" "B5"
expand_range("A1:A1") # Single cell[1] "A1"
Translating to R Code
Use translate_formula() to convert Excel formulas to R:
translate_formula("A1+B1")$r_code
[1] "a1+b1"
$success
[1] TRUE
$notes
[1] NA
translate_formula("=SUM(A1:A10)")$r_code
[1] "sum(a1:a10)"
$success
[1] FALSE
$notes
[1] "Contains ranges that may need manual conversion: a1:a10"
translate_formula("=IF(A1>0,A1*2,0)")$r_code
[1] "if_else(a1>0,a1*2,0)"
$success
[1] TRUE
$notes
[1] NA
The output includes r_code (translated R), success (TRUE if fully translated), and notes (warnings). When success = FALSE, the r_code is still provided but may need editing.
Supported Translations
These Excel functions have direct R equivalents and translate automatically:
Mathematical
| Excel | R |
|---|---|
SUM(range) |
sum(range) |
AVERAGE(range) |
mean(range) |
MIN(range) |
min(range) |
MAX(range) |
max(range) |
ABS(x) |
abs(x) |
SQRT(x) |
sqrt(x) |
ROUND(x, n) |
round(x, n) |
CEILING(x) |
ceiling(x) |
FLOOR(x) |
floor(x) |
EXP(x) |
exp(x) |
LN(x) |
log(x) |
LOG10(x) |
log10(x) |
Statistical
| Excel | R |
|---|---|
COUNT(range) |
length(range) |
STDEV(range) |
sd(range) |
VAR(range) |
var(range) |
MEDIAN(range) |
median(range) |
Logical
| Excel | R |
|---|---|
IF(test, true, false) |
if_else(test, true, false) |
AND(a, b, ...) |
all(a, b, ...) |
OR(a, b, ...) |
any(a, b, ...) |
NOT(x) |
!(x) |
Text
| Excel | R |
|---|---|
CONCATENATE(...) |
paste0(...) |
CONCAT(...) |
paste0(...) |
LEN(text) |
nchar(text) |
UPPER(text) |
toupper(text) |
LOWER(text) |
tolower(text) |
TRIM(text) |
trimws(text) |
Date
| Excel | R |
|---|---|
TODAY() |
Sys.Date() |
NOW() |
Sys.time() |
YEAR(date) |
lubridate::year(date) |
MONTH(date) |
lubridate::month(date) |
DAY(date) |
lubridate::day(date) |
Not Supported
These functions require manual translation—tidychain does not attempt to convert them:
| Function | Why | Manual Approach |
|---|---|---|
VLOOKUP |
Range-based lookup doesn’t map to R | Use left_join() or match() |
HLOOKUP |
Same as VLOOKUP | Use left_join() or match() |
INDEX/MATCH |
Requires data structure context | Use df[row, col] indexing |
POWER(x, n) |
Syntax change (x^n) |
Write x^n manually |
MOD(x, n) |
Syntax change (x %% n) |
Write x %% n manually |
COUNTA |
Complex NA handling | Use sum(!is.na(x)) |
LEFT/RIGHT/MID |
Different argument order | Use substr() with adjusted args |
IFERROR |
Requires try/catch logic | Use tryCatch() |
Using Cell Mapping
Provide meaningful variable names instead of cell references:
mapping <- list(
"A1" = "price",
"B1" = "quantity",
"C1" = "tax_rate"
)
translate_formula("=A1*B1*(1+C1)", cell_mapping = mapping)$r_code
[1] "price*quantity*(1+tax_rate)"
$success
[1] TRUE
$notes
[1] NA
Practical Examples
Simple Calculation Model
# Extract formulas
formulas <- extract_formulas("pricing_model.xlsx")
formulas# A tibble: 4 × 6
sheet cell cell_local formula references has_references
<chr> <chr> <chr> <chr> <list> <lgl>
1 Pricing Pricing!B5 B5 B2*B3 <chr [2]> TRUE
2 Pricing Pricing!B6 B6 B5*B4 <chr [2]> TRUE
3 Pricing Pricing!B7 B7 B5+B6 <chr [2]> TRUE
4 Pricing Pricing!B8 B8 IF(B7>1000,B7*0.9,B7) <chr [2]> TRUE
# Translate each formula
formulas |>
rowwise() |>
mutate(
r_code = translate_formula(formula)$r_code
) |>
select(cell_local, formula, r_code)# A tibble: 4 × 3
cell_local formula r_code
<chr> <chr> <chr>
1 B5 B2*B3 b2*b3
2 B6 B5*B4 b5*b4
3 B7 B5+B6 b5+b6
4 B8 IF(B7>1000,B7*0.9,B7) if_else(b7>1000,b7*0.9,b7)
With Cell Mapping
# Define meaningful names
mapping <- list(
"B2" = "price",
"B3" = "quantity",
"B4" = "tax_rate",
"B5" = "subtotal",
"B6" = "tax",
"B7" = "total"
)
# Translate with mapping
formulas |>
rowwise() |>
mutate(
r_code = translate_formula(formula, cell_mapping = mapping)$r_code
) |>
select(cell_local, formula, r_code)# A tibble: 4 × 3
cell_local formula r_code
<chr> <chr> <chr>
1 B5 B2*B3 price*quantity
2 B6 B5*B4 subtotal*tax_rate
3 B7 B5+B6 subtotal+tax
4 B8 IF(B7>1000,B7*0.9,B7) if_else(total>1000,total*0.9,total)
Limitations
Even supported functions may need manual adjustment:
- Range syntax (
A1:A10) — R uses vectors, not Excel ranges - Array formulas — Complex R vectorization needed
- Structured references —
Table1[Column]syntax not parsed - Named ranges — Require workbook context to resolve
- External references —
[Book.xlsx]Sheet!A1not supported - VBA/macros — User-defined functions cannot be translated
Best Practices
- Extract all formulas with
extract_formulas() - Identify inputs (cells with no references)
- Build cell mapping for meaningful names
- Translate iteratively starting with simple formulas
- Test each translation against Excel output
- Document manual changes for complex functions
Next Steps
- Getting Started — Basic package usage
- Forensic Analysis — Detect spreadsheet manipulation
- Understanding Anomalies — Detection algorithm details