targets
University of Luxembourg
Thursday, the 6th of June, 2024
It started with Makefile, when computers power was limiting. Compile objects (*.o
) only when needed: source (*.c
) modified. make
first release is April 1988.
# This is a comment line
CC=gcc
# CFLAGS will be the options passed to the compiler.
CFLAGS= -c -Wall
all: prog
prog: main.o factorial.o hello.o
$(CC) main.o factorial.o hello.o -o prog
main.o: main.c
$(CC) $(CFLAGS) main.c
factorial.o: factorial.c
$(CC) $(CFLAGS) factorial.c
hello.o: hello.c
$(CC) $(CFLAGS) hello.c
clean:
rm -rf *.o
Compile with make
(rule all
)
targets
targets
targets
to create the dependenciesvisnetwork
{callr}
qmd
or Rmd
docstar_render()
/tar_quarto()
renv
. Snapshot your package environment_targets.R
is the only mandatory fileR
sub-folder for functions, gets closer to a packageRmd
/qmd
file allows to gather results in a reportgit
run.R
allows to use Build Tools in RStudioThis example is available at the target_demos
repo
_targets_ds_fun1.R
library(targets)
library(tarchetypes)
source("R/plotting.R")
# load the tidyverse quietly for each target
# which each runs in a fresh R session
tar_option_set(packages = "tidyverse")
list(
# track if distant file has changed
tar_url(ds_file, "https://raw.githubusercontent.com/jumpingrivers/datasauRus/main/inst/extdata/DatasaurusDozen-Long.tsv"),
tar_target(ds, read_tsv(ds_file, show_col_types = FALSE)),
tar_target(all_facets, facet_ds(ds)),
# animation is worth caching ~ 1 min
tar_target(anim, anim_ds(ds),
packages = c("ggplot2", "gganimate", "gifski")),
tar_file(gif, {
anim_save("ds.gif", animation = anim, title_frame = TRUE)
# anim_save returns NULL, we need to get the file output path
"ds.gif"},
packages = c("gganimate")),
tar_quarto(report, "ds1.qmd")
)
Animation code is presented as function, full code in _targets_ds_1.R
Precise description of steps in a table
> tar_manifest()
# A tibble: 6 × 2
name command
<chr> <chr>
1 ds_file "\"https://raw.gi[...]n/inst/extdata/DatasaurusDozen-Long.tsv\""
2 ds "read_tsv(ds_file, show_col_types = FALSE)"
3 anim "anim_ds(ds)"
4 all_facets "facet_ds(ds)"
5 gif "{\n anim_save(\"ds.gif\", animation = anim, title_frame = TRUE)\n \"ds.gif\"\n }"
6 report "tarchetypes::tar_quarto_run(args = list(input = \"ds1.qmd\", \n execute = TRUE,
We recommend using it within a target and not the Target Markdown that overloads the document.
Like the targets_demos
repo which has 4 projects
_targets.yaml
targets
needs a R script and a store location
run.R
:
Issue on Windows
Seems that a custom script is not working on
Change that are comments are not invalidating a target
✔ skipped target dset_files
[...]
✔ skipped branch dset_1357daeb5edc5b3b
▶ dispatched branch dset_376af7da24ddcfc7
● completed branch dset_376af7da24ddcfc7 [0.001 seconds]
✔ skipped branch dset_fc156975d3544187
[...]
✔ skipped branch ds_4bc1a3d4ea6fdf12
▶ dispatched branch ds_501bf242796ba6b2
● completed branch ds_501bf242796ba6b2 [0.892 seconds]
✔ skipped branch ds_c601ea8afad80c5f
● completed pattern ds
✔ skip branch summary_stat_ad2f392a
[...]
✔ skipped branch summary_stat_aad2733c0eca3cae
▶ dispatched branch summary_stat_0f7ac98a50809586
● completed branch summary_stat_0f7ac98a50809586 [0.02 seconds]
✔ skipped branch summary_stat_9cefee38f54d6115
[...]
✔ skipped branch plots_aad2733c0eca3cae
▶ dispatched branch plots_0f7ac98a50809586
● completed branch plots_0f7ac98a50809586 [0.031 seconds]
● completed pattern plots
▶ dispatched target report
● completed target report [13.378 seconds]
▶ ended pipeline [16.281 seconds]
Dynamic branch names are not meaningful, just hashes
tar_map()
is from {tarchetypes}
Dynamic | Static |
---|---|
Pipeline creates new targets at runtime. | All targets defined in advance. |
Cryptic target names. | Friendly target names. |
Scales to hundreds of branches. | Does not scale as easily for tar_visnetwork() etc. |
No metaprogramming required. | Familiarity with metaprogramming is helpful. |
static branching is most useful for smaller number of heterogeneous targets.
Source: targets
manual by William Landau
More difficult to write with tar_map()
(see example)
But meaningful names and combine when needed:
Use tar_manifest()
to display exactly the command to be run
Command used tar_visnetwork(label = c("description", "branches"))
_targets_ds_3.R
, static branches:# Static branching with dynamic branching inside
values <- tibble(
folders = c("lines", "circles", "others")
)
# tar_map() generates R expressions, and substitute the desired 'values'
mapped <- tar_map(
values = values,
names = "folders", # to avoid targets reporting "files_lines_lines"
tar_target(filenames, fs::dir_ls(folders, glob = "*tsv")),
# filenames is not of format file, no checksum is done
# we need a dynamic pattern at this step to read them dynamically too
tar_target(files, format = "file", filenames,
pattern = map(filenames)),
# Dynamic within static
tar_target(ds, read_tsv(files, show_col_types = FALSE),
pattern = map(files)),
tar_target(summary_stat, summarise(ds, m_x = mean(x), m_y = mean(y)),
pattern = map(ds)),
tar_target(plots, ggplot(ds, aes(x, y)) +
geom_point(),
pattern = map(ds),
iteration = "list"),
# Patchwork each group into one plot
tar_target(patch_plots,
wrap_plots(plots) +
# Title the last bit of path_plots_{circles,lines,others}
plot_annotation(title = stringr::str_split_i(tar_name(), '_', -1)),
packages = "patchwork")
)
# We want to combined in one tibble the 3 tibble of summary stats
# Each of one them is actually composed of 2, 4 and 7 tibbles
stat_combined <- tar_combine(
stat_summaries,
mapped[["summary_stat"]],
# Force evaluation using triple bang (!!!)
command = dplyr::bind_rows(!!!.x, .id = "ds_type")
)
# And the plots now, a patchwork of patchwork
plot_combined <- tar_combine(
plots_agg,
mapped[["patch_plots"]],
# Force evaluation of all patchwork plots again with triple bang!
command = {wrap_plots(list(!!!.x), ncol = 2) +
plot_annotation(title = "Master Saurus")},
packages = "patchwork"
)
# Wrap all targets in one list
list(mapped,
stat_combined,
plot_combined,
tar_quarto(report, "ds3.qmd"))
!!!
is the unquote-splice operator from {rlang}
tar_combine()
is from {tarchetypes}
tar_manifest()
(paged version in ds3.qmd
)# A tibble: 21 × 4
name command pattern description
<chr> <chr> <chr> <chr>
1 filenames_circles "fs::dir_ls(\"circles\", glob = \"*tsv\")" NA circles
2 filenames_others "fs::dir_ls(\"others\", glob = \"*tsv\")" NA others
3 filenames_lines "fs::dir_ls(\"lines\", glob = \"*tsv\")" NA lines
4 files_circles "filenames_circles" map(fi… circles
5 files_others "filenames_others" map(fi… others
6 files_lines "filenames_lines" map(fi… lines
7 ds_circles "read_tsv(files_circles, show_col_types = FALSE)" map(fi… circles
8 ds_others "read_tsv(files_others, show_col_types = FALSE)" map(fi… others
9 ds_lines "read_tsv(files_lines, show_col_types = FALSE)" map(fi… lines
10 summary_stat_circles "summarise(ds_circles, m_x = mean(x), m_y = mean(y))" map(ds… circles
11 plots_circles "ggplot(ds_circles, aes(x, y)) + geom_point()" map(ds… circles
12 summary_stat_others "summarise(ds_others, m_x = mean(x), m_y = mean(y))" map(ds… others
13 plots_others "ggplot(ds_others, aes(x, y)) + geom_point()" map(ds… others
14 plots_lines "ggplot(ds_lines, aes(x, y)) + geom_point()" map(ds… lines
15 summary_stat_lines "summarise(ds_lines, m_x = mean(x), m_y = mean(y))" map(ds… lines
16 patch_plots_circles "wrap_plots(plots_circles) + plot_annotation(title = stringr::str_spl… NA circles
17 patch_plots_others "wrap_plots(plots_others) + plot_annotation(title = stringr::str_spli… NA others
18 patch_plots_lines "wrap_plots(plots_lines) + plot_annotation(title = stringr::str_split… NA lines
19 stat_summaries "dplyr::bind_rows(summary_stat_lines = summary_stat_lines, \n sum… NA NA
20 plots_agg "wrap_plots(list(patch_plots_lines = patch_plots_lines, \n … NA Key step t…
21 report "tarchetypes::tar_quarto_run(args = list(input = \"ds3.qmd\", \n… NA Rendering …
Recent addition, showing up in tar_manifest()
and network
plot_combined <- tar_combine(
plots_agg,
mapped[["patch_plots"]],
command = wrap_plots(list(!!!.x), ncol = 2) + plot_annotation(title = "Master Saurus"),
packages = "patchwork",
description = "Key step to wrap plots"
)
list(mapped, stat_combined, plot_combined, tar_quarto(report, "ds3.qmd", description = "Rendering quarto doc"))
Dynamic branches still have cryptic names. What is we want to go full static where all steps are known upfront.
Nested tar_map()
: toy example:
library(targets)
library(tarchetypes)
mapped <- tar_map(
#unlist = FALSE, # Return a nested list from tar_map()
values = list(model = c("mod_1", "mod_2")),
tar_target(
distrib,
tar_name(),
),
# static in static
tar_map(
values = list(sim = c("A", "B")),
tar_target(
estim,
paste(distrib, tar_name()),
)
)
)
combined <- tar_combine(combi,
# select all estimations
tar_select_targets(mapped, starts_with("estim")),
command = paste(!!!.x))
list(mapped, combined)
No more square targets
, no pattern = map(...)
_targets_ds_4.R
mapped <- tar_map(
values = values,
names = "names", # to avoid targets reporting "files_data.lines"
# special pair of targets
# readr is in charge of the aggregation (bind_rows())
tar_file_read(files, fs::dir_ls(folders, glob = "*tsv"), read_tsv(file = !!.x, show_col_types = FALSE)),
# nested tar_map
tar_map(
values = list(funs = c("mean", "sd")),
tar_target(summary, summarise(files, x_sum = funs(x), y_sum = funs(y)))
)
)
mcombined <- tar_combine(mean_combine,
# tarchetypes helper to select all averages
tar_select_targets(mapped, contains("_mean_")),
# .x placeholder all matching targets
# !!! unquote-splice operator
command = bind_rows(!!!.x, .id = "set"))
scombined <- tar_combine(sd_combine,
# tarchetypes helper to select all averages
tar_select_targets(mapped, contains("_sd_")),
# .x placeholder all matching targets
# !!! unquote-splice operator
command = bind_rows(!!!.x, .id = "set"))
combi <- tar_combine(stats, mcombined, scombined)
list(mapped, mcombined, scombined, combi)
Thinking at what is a good targets
helps tremendously the coding
- Are large enough to subtract a decent amount of runtime when skipped.
- Are small enough that some targets can be skipped even if others need to run.
- Invoke no side effects (tar_target(format = “file”) can save files.)
- Return a single value that is:
- Easy to understand and introspect.
- Meaningful to the project […]
— William Landau
rds
is the default, but quite slowWatch out
For malicious promises!
Relevant blog post: CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0 by Bob Rudis
Source: Konrad Rudolph about CVE-2024-27322
tar_option_set(error = "null")
tar_meta(fields = error, complete_only = TRUE)
tar_option_set(workspace_on_error = TRUE)
tar_workspaces()
tar_workspace(analysis_02de2921)
all object, variables are visible interactivelytar_traceback(analysis_02de2921)
targets debug
option.
tar_option_set(debug = "analysis_58_b59aa384")
Further reading: debugging chapter
Remember that all code run in a fresh session, so needs to load its package dependencies.
To avoid it:
Remove {callr}
: tar_make(callr_function = NULL)
Or the opposite, remove {targets}
:
# What about just {callr} without {targets}?
callr::r( # same error
func = function() {
set.seed(-1012558151) # from tar_meta(name = dataset1, field = seed)
library(targets)
suppressMessages(tar_load_globals())
data <- simulate_data(units = 100)
analyze_data(data)
},
show = TRUE
)
Source: debug repo by William Landau
Highlights
targets
, dependencies manager, re-run what’s neededWilliam Landau intro:
Further reading 📚
Acknowledgments 🙏 👏
targets
targets
Thank you for your attention!