| Title: | A Toolkit for Research Workflows |
|---|---|
| Description: | Provides utility functions to help researchers implement best practices for their coding projects. Includes tools for reading and cleaning data files, initializing R projects with a standard folder structure, creating 'Quarto' documents from reproducible templates with optional sample data and custom styling, detecting the execution context across interactive, 'Quarto', and script-based workflows, splitting data frames into group-level output files, and rendering syntactic tree diagrams as standalone PNG images via 'Typst'. |
| Authors: | Erwin Lares [aut, cre] (ORCID: <https://orcid.org/0000-0002-3284-828X>) |
| Maintainer: | Erwin Lares <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.4.0 |
| Built: | 2026-06-05 20:14:13 UTC |
| Source: | https://github.com/erwinlares/toolero |
Takes a syntactic tree and renders it using Quarto's Typst engine,
exporting the result as a PNG image. Supports two rendering backends
controlled by tree_notation:
arborize( tree, output = "syntactic-tree.png", dpi = 300, tree_notation = c("simple", "structured"), papersize = "a5", margin = "0.5cm", provenance = TRUE, overwrite = FALSE )arborize( tree, output = "syntactic-tree.png", dpi = 300, tree_notation = c("simple", "structured"), papersize = "a5", margin = "0.5cm", provenance = TRUE, overwrite = FALSE )
tree |
A character string. For |
output |
A character string. Path to the output PNG file.
Defaults to |
dpi |
A numeric value. Resolution of the output PNG in dots per
inch. Defaults to |
tree_notation |
A character string. One of |
papersize |
A character string. Typst paper size for the
intermediate PDF. Defaults to |
margin |
A character string. Page margin for the intermediate
PDF. Defaults to |
provenance |
A logical. Whether to write a companion |
overwrite |
A logical. Whether to overwrite existing output files.
When |
"simple" uses @preview/syntree and accepts a bracket notation
string, e.g. "[S [NP [Det the] [N cat]] [VP [V sat]]]". This is
the most compact input format and suits basic linguistic trees.
"structured" uses @preview/lingotree and accepts a nested
tree() call string. This backend supports per-node styling,
movement arrows, and multi-dominant trees.
The function is useful for producing standalone tree figures that can be embedded in any document format – LaTeX, Word, HTML, or presentations – without requiring a full LaTeX installation.
arborize() performs the following steps:
Validates inputs and resolves the Typst package from tree_notation.
Builds a minimal .qmd document via .build_arborize_qmd().
Writes the document and renders it inside a self-cleaning temporary
directory managed by withr::with_tempdir().
Calls quarto::quarto_render() to produce an intermediate PDF via
Typst.
Converts the PDF to PNG using pdftools::pdf_convert().
Reads the PNG bytes into memory before the temporary directory is deleted, then writes them to the specified output path.
If provenance = TRUE, writes a companion .yaml file recording
the tree string and all rendering arguments.
On first use, Typst will download the required package from the Typst package registry. This requires an internet connection. Subsequent renders use the locally cached package.
Requires Quarto 1.4 or later with Typst support, and the pdftools
package for PDF-to-PNG conversion. Install pdftools with
install.packages("pdftools").
Invisibly returns the path to the output PNG file.
syntree Typst package (v0.2.1): https://typst.app/universe/package/syntree
lingotree Typst package (v1.0.0): https://typst.app/universe/package/lingotree
## Not run: # Simple bracket notation (default) -- also writes tree-1.yaml arborize("[NP [Det the] [N cat]]", output = "my-trees/tree-1.png") # Suppress provenance file arborize("[NP [Det the] [N cat]]", provenance = FALSE) # Wider tree with print-quality output arborize( paste0( "[Aspectual Classes ", "[Statives [States]] ", "[Dynamic ", "[Atelic [Activities]] ", "[Telic ", "[Instantaneous [Achievements]] ", "[Durative [Accomplishments]]]]]" ), output = "aspectual-classes.png", dpi = 600, papersize = "a4" ) # Structured notation using lingotree arborize( "tree( tag: [VP], tree( tag: [DP], [every], [farmer] ), [smiled] )", tree_notation = "structured", output = "vp-tree.png" ) ## End(Not run)## Not run: # Simple bracket notation (default) -- also writes tree-1.yaml arborize("[NP [Det the] [N cat]]", output = "my-trees/tree-1.png") # Suppress provenance file arborize("[NP [Det the] [N cat]]", provenance = FALSE) # Wider tree with print-quality output arborize( paste0( "[Aspectual Classes ", "[Statives [States]] ", "[Dynamic ", "[Atelic [Activities]] ", "[Telic ", "[Instantaneous [Achievements]] ", "[Durative [Accomplishments]]]]]" ), output = "aspectual-classes.png", dpi = 600, papersize = "a4" ) # Structured notation using lingotree arborize( "tree( tag: [VP], tree( tag: [DP], [every], [farmer] ), [smiled] )", tree_notation = "structured", output = "vp-tree.png" ) ## End(Not run)
check_project() audits a project directory and reports whether it follows
the structure and conventions that init_project() creates. It is useful
both for projects initialized with init_project() and for existing
projects that were created independently.
check_project(path = ".", error = TRUE)check_project(path = ".", error = TRUE)
path |
A character string with the path to the project directory.
Defaults to |
error |
Logical. If |
A tibble with columns check, status, and message. Returned
invisibly when error = TRUE, visibly when error = FALSE.
# Audit the current working directory check_project() # Audit a specific project directory ## Not run: check_project(path = "path/to/project") ## End(Not run)# Audit the current working directory check_project() # Audit a specific project directory ## Not run: check_project(path = "path/to/project") ## End(Not run)
Creates a new Quarto document in the specified directory. Optionally copies a sample dataset and a worked analysis example, wires up custom CSS and header styling from a directory of assets, and scaffolds a post-render purl hook for extracting R code.
create_qmd( filename = NULL, path = ".", yaml_data = NULL, overwrite = FALSE, use_purl = TRUE, include_examples = TRUE, use_style = FALSE )create_qmd( filename = NULL, path = ".", yaml_data = NULL, overwrite = FALSE, use_purl = TRUE, include_examples = TRUE, use_style = FALSE )
filename |
A string or |
path |
A string. Path to the directory where the document will be
created. Defaults to |
yaml_data |
A string or |
overwrite |
A logical. Whether to overwrite existing files. Defaults
to |
use_purl |
Logical. If |
include_examples |
Logical. If |
use_style |
Logical or character. Controls whether custom CSS and header assets are wired into the YAML.
If the directory contains exactly one |
create_qmd() performs the following steps:
Validates that filename is supplied and path exists.
If include_examples = TRUE: creates data-raw/ under path and
copies sample.csv there. Creates assets/ if needed and copies a
placeholder logo.png. Uses the example template for the .qmd.
If include_examples = FALSE: uses the skeleton template for the
.qmd. No sample data or logo is copied.
If use_style is TRUE or a directory path: scans the directory for
.css and .html files and injects them into the YAML header.
If yaml_data is provided, reads the YAML file and substitutes values
into the document header. This runs after style injection, so
yaml_data can override any auto-generated YAML keys.
If use_purl = TRUE, writes _quarto.yml with a post-render hook
and copies purl.R into path/R/.
The sample dataset bundled with the template is a subset of the Palmer Penguins dataset. Citation: Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. R package version 0.1.0. doi:10.5281/zenodo.3960218
Note: filename has no default value and must always be supplied
explicitly. Use tempdir() for temporary output during testing or
exploration.
Invisibly returns path.
# Minimal blank document -- no examples, no styling create_qmd(path = tempdir(), filename = "analysis.qmd", include_examples = FALSE) # Full worked example with sample data and placeholder logo create_qmd(path = tempdir(), filename = "analysis.qmd", overwrite = TRUE) # Blank document wired to UW branding assets (assumes assets/ exists) create_qmd(path = tempdir(), filename = "report.qmd", include_examples = FALSE, use_style = TRUE, overwrite = TRUE) # Blank document with custom branding from a different directory create_qmd(path = tempdir(), filename = "report.qmd", include_examples = FALSE, use_style = "my-branding/", overwrite = TRUE, use_purl = FALSE) # Pre-populated YAML overrides yaml_file <- tempfile(fileext = ".yml") writeLines("author:\n - name: 'Your Name'", yaml_file) create_qmd(path = tempdir(), filename = "analysis.qmd", yaml_data = yaml_file, overwrite = TRUE)# Minimal blank document -- no examples, no styling create_qmd(path = tempdir(), filename = "analysis.qmd", include_examples = FALSE) # Full worked example with sample data and placeholder logo create_qmd(path = tempdir(), filename = "analysis.qmd", overwrite = TRUE) # Blank document wired to UW branding assets (assumes assets/ exists) create_qmd(path = tempdir(), filename = "report.qmd", include_examples = FALSE, use_style = TRUE, overwrite = TRUE) # Blank document with custom branding from a different directory create_qmd(path = tempdir(), filename = "report.qmd", include_examples = FALSE, use_style = "my-branding/", overwrite = TRUE, use_purl = FALSE) # Pre-populated YAML overrides yaml_file <- tempfile(fileext = ".yml") writeLines("author:\n - name: 'Your Name'", yaml_file) create_qmd(path = tempdir(), filename = "analysis.qmd", yaml_data = yaml_file, overwrite = TRUE)
Identifies which of three execution environments the code is currently
running in: an interactive R session, a quarto render call, or a
plain Rscript invocation. This is useful for writing code that behaves
correctly across all three contexts, such as resolving input file paths
in a portable way.
detect_execution_context(interactive_fn = interactive)detect_execution_context(interactive_fn = interactive)
interactive_fn |
A function. Used to detect whether the session is
interactive. Defaults to |
Detection follows a priority order:
If interactive() is TRUE, returns "interactive".
If the environment variable QUARTO_DOCUMENT_PATH is set and non-empty,
returns "quarto".
Otherwise, returns "rscript".
A character string, one of "interactive", "quarto", or
"rscript".
context <- detect_execution_context() input_file <- switch(context, interactive = "data/sample.csv", quarto = params$input_file, rscript = commandArgs(trailingOnly = TRUE)[1] )context <- detect_execution_context() input_file <- switch(context, interactive = "data/sample.csv", quarto = params$input_file, rscript = commandArgs(trailingOnly = TRUE)[1] )
Takes a Quarto document and produces an XML file that is directly
importable into a UW-Madison Knowledge Base (KB) article. The function
re-renders the .qmd with embed-resources: true so all visual assets
are self-contained, extracts the HTML body, and wraps it in the KB XML
structure along with metadata drawn from the document's YAML header.
generate_kb_xml(html_path, qmd_path = NULL, output_dir = NULL)generate_kb_xml(html_path, qmd_path = NULL, output_dir = NULL)
html_path |
A string. Path to the rendered HTML file. Used to infer
the output filename and, if |
qmd_path |
A string or |
output_dir |
A string or |
generate_kb_xml() performs the following steps:
Validates that html_path exists.
Infers qmd_path from html_path if not supplied, then validates it.
Extracts title, description, and categories from the .qmd YAML
header and maps them to kb_title, kb_summary, and kb_keywords.
Re-renders the .qmd in an isolated temporary directory with
embed-resources: true so all CSS, images, and JS are self-contained.
The data/ and assets/ folders are copied alongside the .qmd to
ensure the render succeeds.
Extracts the <body> from the embedded HTML.
Escapes HTML entities in the body for XML compatibility, as required by the UW-Madison KB import format.
Builds the XML structure with kb_title, kb_keywords, kb_summary,
and kb_body nodes.
Writes the .xml file to output_dir.
Temporary files are managed via withr::local_tempdir() and are
automatically cleaned up when the function exits, even on error.
When importing the resulting XML into the KB, check the Decode HTML entity in body content option.
Invisibly returns the path to the written .xml file.
# Infer qmd_path automatically, write XML alongside the HTML # generate_kb_xml(html_path = "docs/analysis.html") # Supply qmd_path explicitly and write to a specific output directory # generate_kb_xml( # html_path = "docs/analysis.html", # qmd_path = "analysis.qmd", # output_dir = "exports" # )# Infer qmd_path automatically, write XML alongside the HTML # generate_kb_xml(html_path = "docs/analysis.html") # Supply qmd_path explicitly and write to a specific output directory # generate_kb_xml( # html_path = "docs/analysis.html", # qmd_path = "analysis.qmd", # output_dir = "exports" # )
init_project() creates a new R project at the given path with an
opinionated folder structure suited for research workflows. It optionally
initializes renv for package management and git for version control.
init_project( path, use_renv = TRUE, use_git = TRUE, extra_folders = NULL, open = FALSE, uw_branding = FALSE )init_project( path, use_renv = TRUE, use_git = TRUE, extra_folders = NULL, open = FALSE, uw_branding = FALSE )
path |
A character string with the path and name of the new
project (e.g., |
use_renv |
Logical. If |
use_git |
Logical. If |
extra_folders |
A character vector of additional folder names to create
inside the project. Defaults to |
open |
Logical. If |
uw_branding |
Logical. If |
Called for its side effects. Does not return a value.
## Not run: init_project(path = file.path(tempdir(), "project1"), use_renv = FALSE, use_git = FALSE) init_project(path = file.path(tempdir(), "project2"), uw_branding = TRUE, use_renv = FALSE, use_git = FALSE) init_project(path = file.path(tempdir(), "project3"), extra_folders = c("notebooks"), use_renv = FALSE, use_git = FALSE) ## End(Not run)## Not run: init_project(path = file.path(tempdir(), "project1"), use_renv = FALSE, use_git = FALSE) init_project(path = file.path(tempdir(), "project2"), uw_branding = TRUE, use_renv = FALSE, use_git = FALSE) init_project(path = file.path(tempdir(), "project3"), extra_folders = c("notebooks"), use_renv = FALSE, use_git = FALSE) ## End(Not run)
qmd_to_r() extracts R code chunks from a .qmd file and writes them
to a standalone .R script using knitr::purl(). It works on any .qmd
file regardless of whether it was created with create_qmd().
qmd_to_r(input, output = NULL, documentation = 1L, quiet = TRUE)qmd_to_r(input, output = NULL, documentation = 1L, quiet = TRUE)
input |
A character string with the path to the |
output |
A character string with the path to the output |
documentation |
An integer controlling how much documentation is
included in the extracted script. Passed to |
quiet |
Logical. If |
Invisibly returns the path to the output .R file.
# Extract R code from a qmd file qmd <- tempfile(fileext = ".qmd") writeLines(c( "---", "title: Analysis", "---", "", "```{r}", "x <- 1 + 1", "```" ), qmd) # Default output path: same directory, .R extension qmd_to_r(input = qmd) # Explicit output path out <- tempfile(fileext = ".R") qmd_to_r(input = qmd, output = out) # Strip all documentation qmd_to_r(input = qmd, output = out, documentation = 0L)# Extract R code from a qmd file qmd <- tempfile(fileext = ".qmd") writeLines(c( "---", "title: Analysis", "---", "", "```{r}", "x <- 1 + 1", "```" ), qmd) # Default output path: same directory, .R extension qmd_to_r(input = qmd) # Explicit output path out <- tempfile(fileext = ".R") qmd_to_r(input = qmd, output = out) # Strip all documentation qmd_to_r(input = qmd, output = out, documentation = 0L)
read_clean_csv() reads a CSV file, standardizes column names, optionally
handles missing values, and optionally prints an ingest summary. It combines
readr::read_csv(), janitor::clean_names(), and tidyr::drop_na() into a
single, reproducibility-friendly step.
read_clean_csv( path, na = c("", "NA"), drop_na = FALSE, summary = FALSE, verbose = FALSE, ... )read_clean_csv( path, na = c("", "NA"), drop_na = FALSE, summary = FALSE, verbose = FALSE, ... )
path |
A character string with the path to the CSV file. |
na |
A character vector of strings to treat as missing values. Passed
directly to |
drop_na |
Logical or character vector. If |
summary |
Logical. If |
verbose |
Logical. If |
... |
Additional arguments passed to |
A tibble with cleaned column names.
sample_path <- system.file("templates", "sample.csv", package = "toolero") # Basic usage data <- read_clean_csv(sample_path) # Explicit missing-value codes data <- read_clean_csv(sample_path, na = c("", "NA", "N/A", ".", "-999")) # Drop rows missing in any column data <- read_clean_csv(sample_path, drop_na = TRUE) # Drop rows missing in specific columns data <- read_clean_csv(sample_path, drop_na = c("bill_length_mm", "sex")) # Print ingest summary data <- read_clean_csv(sample_path, summary = TRUE) # Combine arguments data <- read_clean_csv( sample_path, na = c("", "NA", "N/A", "."), drop_na = TRUE, summary = TRUE )sample_path <- system.file("templates", "sample.csv", package = "toolero") # Basic usage data <- read_clean_csv(sample_path) # Explicit missing-value codes data <- read_clean_csv(sample_path, na = c("", "NA", "N/A", ".", "-999")) # Drop rows missing in any column data <- read_clean_csv(sample_path, drop_na = TRUE) # Drop rows missing in specific columns data <- read_clean_csv(sample_path, drop_na = c("bill_length_mm", "sex")) # Print ingest summary data <- read_clean_csv(sample_path, summary = TRUE) # Combine arguments data <- read_clean_csv( sample_path, na = c("", "NA", "N/A", "."), drop_na = TRUE, summary = TRUE )
Splits a data frame by a single grouping column and writes each group to a separate CSV file. Optionally writes a manifest file listing the output files, their group values, and row counts.
write_by_group(data, group_col, output_dir = NULL, manifest = FALSE)write_by_group(data, group_col, output_dir = NULL, manifest = FALSE)
data |
A data frame or tibble to split and save. |
group_col |
A string. The name of the column to group by. |
output_dir |
A string or |
manifest |
A logical. Whether to write a |
Output filenames are derived from the group values of group_col.
Values are sanitized before use as filenames: converted to lowercase,
spaces and special characters replaced with -, consecutive dashes
collapsed, and leading/trailing dashes stripped.
If manifest = TRUE, a manifest.csv is written to output_dir
containing three columns: group_value, n_rows, and file_path.
Note: output_dir has no default value. Always supply an explicit path
to avoid writing files to unexpected locations. Use tempdir() for
temporary output during testing or exploration.
Invisibly returns output_dir.
# Split a small data frame by group and write to a temp directory data <- data.frame( species = c("Adelie", "Adelie", "Gentoo"), mass = c(3750, 3800, 5000) ) write_by_group(data, group_col = "species", output_dir = tempdir()) # Same but also write a manifest write_by_group(data, group_col = "species", output_dir = tempdir(), manifest = TRUE)# Split a small data frame by group and write to a temp directory data <- data.frame( species = c("Adelie", "Adelie", "Gentoo"), mass = c(3750, 3800, 5000) ) write_by_group(data, group_col = "species", output_dir = tempdir()) # Same but also write a manifest write_by_group(data, group_col = "species", output_dir = tempdir(), manifest = TRUE)
write_clean_csv() writes a data frame to a CSV file using
readr::write_csv() and emits a cli confirmation message reporting
the number of rows and columns written. It is the natural counterpart
to read_clean_csv(), reinforcing the convention that data-raw/
holds original inputs and data/ holds cleaned, analysis-ready outputs.
write_clean_csv(data, path, overwrite = FALSE, ...)write_clean_csv(data, path, overwrite = FALSE, ...)
data |
A data frame or tibble to write. |
path |
A character string with the path to the output CSV file. |
overwrite |
Logical. If |
... |
Additional arguments passed to |
If column names are not already clean, write_clean_csv() applies
janitor::clean_names() before writing and emits a warning listing
the affected columns.
Invisibly returns path.
sample_path <- system.file("templates", "sample.csv", package = "toolero") data <- read_clean_csv(sample_path) # Write to a temp file out <- tempfile(fileext = ".csv") write_clean_csv(data, out) # Overwrite an existing file write_clean_csv(data, out, overwrite = TRUE) # Dirty names are cleaned automatically with a warning dirty <- data.frame("First Name" = "Jane", "Last Name" = "Doe", check.names = FALSE) write_clean_csv(dirty, tempfile(fileext = ".csv"))sample_path <- system.file("templates", "sample.csv", package = "toolero") data <- read_clean_csv(sample_path) # Write to a temp file out <- tempfile(fileext = ".csv") write_clean_csv(data, out) # Overwrite an existing file write_clean_csv(data, out, overwrite = TRUE) # Dirty names are cleaned automatically with a warning dirty <- data.frame("First Name" = "Jane", "Last Name" = "Doe", check.names = FALSE) write_clean_csv(dirty, tempfile(fileext = ".csv"))