library(sqlm)
library(orbital)
library(dplyr)
library(DBI)
con <- dbConnect(duckdb::duckdb())
dbWriteTable(con, "mtcars", mtcars |> mutate(cyl = as.character(cyl)))
mt_db <- tbl(con, "mtcars")In-Database Predictions with orbital
sqlm fits linear models entirely in the database using SQL aggregation — no data is pulled into R. But what about predictions? By default you would need to pull data into R, apply the coefficients, and push results back. The orbital package solves this by converting fitted models into lightweight prediction objects that work directly on database tables.
The entire workflow stays in the database: fit in SQL, predict in SQL.
Setup
Basic predictions
Fit a model with lm_sql(), convert it to an orbital object, and use augment() to append predictions to the database table.
model <- lm_sql(mpg ~ disp + hp + wt, data = mt_db)
orb <- orbital(model)
orb
── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 37.10551 + -0.0009370091 * (disp) + -0.03115655 * (hp) + -3.800 ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.
The orbital object is a single equation built from the fitted coefficients. Use augment() to add a .pred column to any database table with the required columns.
augment(orb, mt_db)# Source: SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
.pred mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 23.6 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 22.6 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 25.3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.2 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.2 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 20.5 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 15.6 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 22.9 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.0 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 20.0 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ℹ more rows
The .pred column is computed entirely in SQL — the data never leaves the database. Use predict() if you only want the predictions without the original columns.
predict(orb, mt_db)# Source: SQL [?? x 1]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
.pred
<dbl>
1 23.6
2 22.6
3 25.3
4 21.2
5 18.2
6 20.5
7 15.6
8 22.9
9 22.0
10 20.0
# ℹ more rows
Categorical variables
Categorical predictors work automatically. sqlm encodes dummy variables during fitting and embeds the equivalent ifelse() logic into the orbital expression, which dbplyr translates to SQL CASE WHEN.
model_cat <- lm_sql(mpg ~ disp + cyl, data = mt_db)
orb_cat <- orbital(model_cat)
orb_cat
── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 29.53477 + -0.02730864 * (disp) + -4.785846 * (ifelse(cyl == "6 ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.
augment(orb_cat, mt_db)# Source: SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
.pred mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20.4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 20.4 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 26.6 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 17.7 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 14.9 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.9 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 25.5 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 25.7 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 20.2 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ℹ more rows
Interactions
Interactions between numeric and categorical variables, or between two categorical variables, are fully supported.
model_int <- lm_sql(mpg ~ disp * cyl, data = mt_db)
orb_int <- orbital(model_int)
orb_int
── orbital Object ──────────────────────────────────────────────────────────────
• .pred = 40.87196 + -0.1351418 * (disp) + -21.78997 * (ifelse(cyl == "6" ...
────────────────────────────────────────────────────────────────────────────────
1 equations in total.
augment(orb_int, mt_db)# Source: SQL [?? x 12]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
.pred mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 19.7 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 19.7 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 26.3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 20.0 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 15.0 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 19.9 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 15.0 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 21.0 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 21.8 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.7 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ℹ more rows
Inspecting the generated SQL
Use orbital_sql() to see the exact SQL expression that will be executed against the database.
orbital_sql(orb, con)<SQL> ((37.1055052690311 + -0.000937009081492857 * (disp)) + -0.0311565508299438 * (hp)) + -3.80089058263718 * (wt) AS .pred
Saving and loading orbital objects
Orbital objects can be serialized to JSON for storage or deployment, independent of the original model.
orbital_json_write(orb, "my_model.json")
# Later, in a different session or environment:
orb_loaded <- orbital_json_read("my_model.json")
augment(orb_loaded, mt_db)Scoring new database tables
Because the orbital object is self-contained, you can score any database table that has the required columns — it does not need to be the same table used for fitting.
# Create a new table with different data
new_cars <- data.frame(disp = c(100, 200, 300), hp = c(90, 150, 200), wt = c(2.5, 3.0, 3.5))
dbWriteTable(con, "new_cars", new_cars)
new_cars_db <- tbl(con, "new_cars")
predict(orb, new_cars_db)# Source: SQL [?? x 1]
# Database: DuckDB 1.4.4 [hagan@Linux 6.17.9-76061709-generic:R 4.5.2/:memory:]
.pred
<dbl>
1 24.7
2 20.8
3 17.3