In-Database Predictions with orbital

sqlm fits linear models entirely in the database using SQL aggregation — no data is pulled into R. But what about predictions? By default you would need to pull data into R, apply the coefficients, and push results back. The orbital package solves this by converting fitted models into lightweight prediction objects that work directly on database tables.

The entire workflow stays in the database: fit in SQL, predict in SQL.

Setup

library(sqlm)
library(orbital)
library(dplyr)
library(DBI)

con <- dbConnect(duckdb::duckdb())
dbWriteTable(con, "mtcars", mtcars |> mutate(cyl = as.character(cyl)))
mt_db <- tbl(con, "mtcars")

Basic predictions

Fit a model with lm_sql(), convert it to an orbital object, and use augment() to append predictions to the database table.

model <- lm_sql(mpg ~ disp + hp + wt, data = mt_db)

orb <- orbital(model)
orb

── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 37.10551 + -0.0009370091 * (disp) + -0.03115655 * (hp) + -3.800 ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.

The orbital object is a single equation built from the fitted coefficients. Use augment() to add a .pred column to any database table with the required columns.

augment(orb, mt_db)
# Source:   SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
   .pred   mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  23.6  21   6      160    110  3.9   2.62  16.5     0     1     4     4
 2  22.6  21   6      160    110  3.9   2.88  17.0     0     1     4     4
 3  25.3  22.8 4      108     93  3.85  2.32  18.6     1     1     4     1
 4  21.2  21.4 6      258    110  3.08  3.22  19.4     1     0     3     1
 5  18.2  18.7 8      360    175  3.15  3.44  17.0     0     0     3     2
 6  20.5  18.1 6      225    105  2.76  3.46  20.2     1     0     3     1
 7  15.6  14.3 8      360    245  3.21  3.57  15.8     0     0     3     4
 8  22.9  24.4 4      147.    62  3.69  3.19  20       1     0     4     2
 9  22.0  22.8 4      141.    95  3.92  3.15  22.9     1     0     4     2
10  20.0  19.2 6      168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ more rows

The .pred column is computed entirely in SQL — the data never leaves the database. Use predict() if you only want the predictions without the original columns.

predict(orb, mt_db)
# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
   .pred
   <dbl>
 1  23.6
 2  22.6
 3  25.3
 4  21.2
 5  18.2
 6  20.5
 7  15.6
 8  22.9
 9  22.0
10  20.0
# ℹ more rows

Categorical variables

Categorical predictors work automatically. sqlm encodes dummy variables during fitting and embeds the equivalent ifelse() logic into the orbital expression, which dbplyr translates to SQL CASE WHEN.

model_cat <- lm_sql(mpg ~ disp + cyl, data = mt_db)

orb_cat <- orbital(model_cat)
orb_cat

── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 29.53477 + -0.02730864 * (disp) + -4.785846 * (ifelse(cyl == "6 ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.
augment(orb_cat, mt_db)
# Source:   SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
   .pred   mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  20.4  21   6      160    110  3.9   2.62  16.5     0     1     4     4
 2  20.4  21   6      160    110  3.9   2.88  17.0     0     1     4     4
 3  26.6  22.8 4      108     93  3.85  2.32  18.6     1     1     4     1
 4  17.7  21.4 6      258    110  3.08  3.22  19.4     1     0     3     1
 5  14.9  18.7 8      360    175  3.15  3.44  17.0     0     0     3     2
 6  18.6  18.1 6      225    105  2.76  3.46  20.2     1     0     3     1
 7  14.9  14.3 8      360    245  3.21  3.57  15.8     0     0     3     4
 8  25.5  24.4 4      147.    62  3.69  3.19  20       1     0     4     2
 9  25.7  22.8 4      141.    95  3.92  3.15  22.9     1     0     4     2
10  20.2  19.2 6      168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ more rows

Interactions

Interactions between numeric and categorical variables, or between two categorical variables, are fully supported.

model_int <- lm_sql(mpg ~ disp * cyl, data = mt_db)

orb_int <- orbital(model_int)
orb_int

── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 40.87196 + -0.1351418 * (disp) + -21.78997 * (ifelse(cyl == "6" ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.
augment(orb_int, mt_db)
# Source:   SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
   .pred   mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  19.7  21   6      160    110  3.9   2.62  16.5     0     1     4     4
 2  19.7  21   6      160    110  3.9   2.88  17.0     0     1     4     4
 3  26.3  22.8 4      108     93  3.85  2.32  18.6     1     1     4     1
 4  20.0  21.4 6      258    110  3.08  3.22  19.4     1     0     3     1
 5  15.0  18.7 8      360    175  3.15  3.44  17.0     0     0     3     2
 6  19.9  18.1 6      225    105  2.76  3.46  20.2     1     0     3     1
 7  15.0  14.3 8      360    245  3.21  3.57  15.8     0     0     3     4
 8  21.0  24.4 4      147.    62  3.69  3.19  20       1     0     4     2
 9  21.8  22.8 4      141.    95  3.92  3.15  22.9     1     0     4     2
10  19.7  19.2 6      168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ more rows

Inspecting the generated SQL

Use orbital_sql() to see the exact SQL expression that will be executed against the database.

orbital_sql(orb, con)
<SQL> ((37.1055052690311 + -0.000937009081492857 * (disp)) + -0.0311565508299438 * (hp)) + -3.80089058263718 * (wt) AS .pred

Saving and loading orbital objects

Orbital objects can be serialized to JSON for storage or deployment, independent of the original model.

orbital_json_write(orb, "my_model.json")

# Later, in a different session or environment:
orb_loaded <- orbital_json_read("my_model.json")
augment(orb_loaded, mt_db)

Scoring new database tables

Because the orbital object is self-contained, you can score any database table that has the required columns — it does not need to be the same table used for fitting.

# Create a new table with different data
new_cars <- data.frame(disp = c(100, 200, 300), hp = c(90, 150, 200), wt = c(2.5, 3.0, 3.5))
dbWriteTable(con, "new_cars", new_cars)
new_cars_db <- tbl(con, "new_cars")

predict(orb, new_cars_db)
# Source:   SQL [?? x 1]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
  .pred
  <dbl>
1  24.7
2  20.8
3  17.3