Combining R and SAS with tidy functional programming for clinical trial design


James Otto (Alcon Laboratories)

Mary F. O. Rosenbloom (Alcon Laboratories)

8/5/2024

Classic Monte Carlo simulation in R

reps <- 300
n <- 20
mu <- rep(c(-3, 0, 3), each = 2 * reps)
sd <- rep(c(1, 5), times = 3 * reps)

X <- matrix(nrow = n, ncol = 6 * reps)

for (i in 1:reps) {
  X[,i] <- rnorm(n, mu[i], sd[i])
}
X[,1:6]
           [,1]        [,2]       [,3]        [,4]       [,5]         [,6]
 [1,] -2.630581  -6.6223312 -4.0858772   0.6304807 -3.4600831  -8.45455691
 [2,] -3.253151  -5.1379551 -1.8285950  -1.1349597 -3.8039305  -2.50818027
 [3,] -3.413220  -2.3445357 -4.4123656   1.0501275 -3.8017813  -5.17484317
 [4,] -3.373942  -3.4253004 -4.5985037   0.3389164 -2.9291188   2.23753171
 [5,] -2.893233  -4.0830990 -2.5066527  -8.4522716 -3.0181087   1.06348075
 [6,] -3.091435  -8.9086634 -1.8326823  -0.8023888 -3.2028669  -5.86899080
 [7,] -1.523937   4.3249815 -3.9273511  -4.3522138 -3.2081149  -1.09327033
 [8,] -3.798781   1.5388346 -2.7922290 -11.5349123 -3.0434794   3.61724759
 [9,] -1.836185  -3.3412674 -0.5236028   1.9671060 -1.9143249  -5.05029002
[10,] -2.022128  -7.2243503 -3.3329089  -2.4343393 -2.5033699  -3.83888126
[11,] -2.940798   1.9271931 -2.2702586  -9.6196008 -3.6022520  -0.92737994
[12,] -1.736110  -3.7674235 -2.1245368  -8.5225116 -0.6359025  -2.54168231
[13,] -4.594081   0.2417171 -2.3087239  -3.0650582 -4.9147425  -6.81605535
[14,] -2.963747   4.5167764 -2.9114918  -8.0295088 -3.9540232  -3.87364945
[15,] -1.921069 -10.3057989 -2.4413618   5.9513275 -4.2714828  -1.07880796
[16,] -1.993098  -3.5868047 -3.3860677  -3.3989308 -2.5624354   0.08149092
[17,] -4.990535   6.1776827 -4.7404026   5.3352953 -3.2405270  -9.30594713
[18,] -3.062359  -2.2536011 -1.1395307  -7.7023205 -2.8935073 -11.55408101
[19,] -2.353846  -1.8658461 -2.9196943   0.3577179 -3.7451593  -6.75104582
[20,] -2.449484  -6.7673725 -1.8759079 -11.0523401 -3.5855444  -2.92907763

Classic Monte Carlo simulation in R

  • Difficult to associate design points and data
  • Isolated lower-level R objects not suitable for modern tools (e.g. the pipe, {ggplot2}, {dplyr})
  • Not scaleable:
    • Difficult to add design points
    • Very complicated with high dimensional design space

Tidy organization of simulated data

Tidy Data (Wickham, 2014)

Current tools often require translation. You have to spend time munging the output from one tool so you can input it into another. Tidy datasets and tidy tools work hand in hand to make data analysis easier, allowing you to focus on the interesting domain problem, not on the uninteresting logistics of data.

Tidy Data (Wickham, 2014)

In tidy data:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table

Tidy Monte Carlo Simulation Studies

We propose an extension of Wickham’s tidy data sets to organize Monte Carlo simulation studies

In tidy Monte Carlo simulation data, each simulation forms a row

This architecture is achieved through the combination of tools from {dplyr} and {tidyr} along with helper functions which simulate data, fit models, and parse results

library("tidyverse")

params <- 
  tidyr::expand_grid(
    n = 20,
    mu = c(-3, 0, 3),
    sd = c(1, 5),
    k = 1:300
  ) 
params

library("tidyverse")

params <- 
  tidyr::expand_grid(
    n = 20,
    mu = c(-3, 0, 3),
    sd = c(1, 5),
    k = 1:300
  ) 

sim <- function(n, mu, sd) {
  rnorm(n, mu, sd)
}

df_sims <- 
  params |> 
  dplyr::rowwise() |> 
  dplyr::mutate(data = list(sim(n, mu, sd)))
df_sims
df_sims$data[1:10]
[[1]]
 [1] -1.356861 -2.565877 -1.495438 -3.319827 -3.735363 -2.025026 -3.772996
 [8] -3.219340 -1.354820 -3.743496 -2.789655 -2.778821 -2.697332 -3.626723
[15] -1.657015 -2.900189 -2.405777 -3.950710 -2.506110 -3.954489

[[2]]
 [1] -3.0198962 -3.9701626 -1.7374915 -3.5827451 -2.4224878 -3.3025775
 [7] -3.5390364 -1.5676727 -0.9474058 -2.5314241 -4.1736156 -3.8702924
[13] -2.4451568 -2.4604348 -4.2330628 -3.2837864 -3.6893732 -3.3935759
[19] -2.5159692 -4.1779564

[[3]]
 [1] -3.101322 -4.144713 -3.500959 -2.138049 -3.791242 -2.338755 -2.186104
 [8] -2.648083 -3.915646 -4.901542 -4.524950 -1.957104 -3.257168 -1.774349
[15] -1.165538 -2.436560 -3.905221 -3.389182 -2.219959 -3.009553

[[4]]
 [1] -4.434008 -3.440495 -2.377985 -2.635797 -1.605048 -2.964481 -4.975869
 [8] -3.932580 -2.573914 -1.774989 -2.962207 -4.628303 -3.987449 -3.974713
[15] -3.744804 -3.835609 -4.011704 -2.436765 -3.338366 -3.711277

[[5]]
 [1] -4.036552 -1.344426 -1.992312 -2.413752 -1.452406 -3.062954 -3.495262
 [8] -3.083995 -3.007698 -1.688222 -4.806057 -3.265555 -5.425740 -2.168278
[15] -2.049575 -1.506959 -3.334340 -3.255436 -2.666313 -3.656416

[[6]]
 [1] -2.913208 -2.659060 -3.287128 -2.026989 -2.411407 -1.916472 -2.750291
 [8] -3.278254 -2.527874 -3.861388 -4.235598 -4.220167 -3.350975 -3.257580
[15] -2.833432 -3.960884 -3.794033 -1.613150 -2.849459 -3.144635

[[7]]
 [1] -1.778523 -1.288731 -3.109331 -4.515338 -3.302832 -3.948305 -3.168170
 [8] -1.458474 -6.231576 -1.831384 -4.229080 -1.908134 -5.083474 -2.082276
[15] -1.903475 -2.479388 -3.309285 -2.388372 -1.630412 -4.309091

[[8]]
 [1] -2.41100235 -2.81894492 -3.19379275 -3.69951036 -4.42611324 -2.36482814
 [7] -1.24461062 -3.50052457 -3.18626541 -1.53515331 -1.77041600 -2.98593033
[13] -0.03002331 -4.11483669 -5.40199479 -4.54608307 -1.92459806 -4.31437113
[19] -3.06184226 -2.37931902

[[9]]
 [1] -4.2414200 -0.9455701 -3.1825786 -3.1717475 -2.0660216 -0.9403217
 [7] -3.1527525 -2.3176930 -2.9638501 -2.3209526 -3.3568487 -2.6264821
[13] -2.9697060 -3.0253529 -2.3261255 -2.9846937 -1.1479648 -1.6714178
[19] -2.9821157 -2.2557475

[[10]]
 [1] -4.005124 -0.501077 -2.766754 -3.069056 -2.760312 -2.402478 -4.313694
 [8] -2.075414 -0.914348 -3.683611 -5.013554 -1.714473 -2.127308 -2.636621
[15] -1.704653 -2.138793 -4.455611 -3.987638 -4.942974 -4.264543

library("tidyverse")

params <-
  expand_grid(
    n = 20,
    beta_0 = c(0, 1),
    beta_1 = c(0, 2, 4),
    sd = c(1, 10),
    k = 1:300
  )

sim <- function(n, beta_0, beta_1, sd) {
  tibble(x = runif(n, 0, 10)) |>
  mutate(y = beta_0 + beta_1 * x + rnorm(n, mean = 0, sd))
}

df_sims <- params |>
  rowwise() |>
  mutate(data = list(sim(n, beta_0, beta_1, sd)))
df_sims
df_sims$data[[1]]

fit_model <- function(data) {
  lm(y ~ x, data = data)
}

get_rsq <- function(fit) {
  summary(fit)$r.squared
}

df_sims <- df_sims |> 
  mutate(
    fit = list(fit_model(data)),
    rsq = get_rsq(fit)
  )
df_sims

ggplot(df_sims) +
  aes(
    x = beta_1, y = rsq, color = factor(beta_0), 
    group = interaction(beta_0, beta_1)
  ) +
  geom_boxplot(position = position_dodge2()) +
  facet_wrap("sd")

A Common Problem:

  • This works very well for designs that can be modeled in R
    • e.g. lm(), glm(), {mmrm}, {rjags}, {cmdstanr}
  • How to apply to designs that can only and/or must be modeled in SAS?

Pairing R and SAS for tidy simulation studies

Repeated Measures Design with Compound Symmetric Covariance

\[ \mathbf{y}_i^1 \sim \mathcal{N}_p\left(\boldsymbol{\mu}_1, \Sigma_1\right), \qquad \mathbf{y}_i^2 \sim \mathcal{N}_p\left(\boldsymbol{\mu}_2, \Sigma_2\right), \]

\[ \boldsymbol{\mu}_1 = \left\{\mu_1\right\}_{t=1}^p, \qquad \boldsymbol{\mu}_2 = \left\{\mu_2\right\}_{t=1}^p, \]

\[ \Sigma_1 = \begin{bmatrix} \sigma_1^2 & \rho \sigma_1^2 & \ldots & \rho \sigma_1^2 \\ \rho \sigma_1^2 & \sigma_1^2 & \ldots & \rho \sigma_1^2 \\ \vdots & \vdots & \ddots & \vdots \\ \rho \sigma_1^2 & \rho \sigma_1^2 &\ldots & \sigma_1^2 \end{bmatrix}, \qquad \Sigma_2 = \begin{bmatrix} \sigma_2^2 & \rho \sigma_2^2 & \ldots & \rho \sigma_2^2 \\ \rho \sigma_2^2 & \sigma_2^2 & \ldots & \rho \sigma_2^2 \\ \vdots & \vdots & \ddots & \vdots \\ \rho \sigma_2^2 & \rho \sigma_2^2 &\ldots & \sigma_2^2 \end{bmatrix}. \]

library("tidyverse")

params <-
  expand_grid(
    n = c(10, 20, 30),
    p = c(3, 6),
    var_1 = c(1, 2),
    var_2 = 1,
    rho = c(0, .5, .8),
    mu_1 = c(0, 1, 5),
    mu_2 = 0,
    k = 1:100
  )

sim <- function(n, p, var_1, var_2, rho, mu_1, mu_2, k) {
  
  cov_1 <- matrix(var_1 * rho, ncol = p, nrow = p)
  diag(cov_1) <- var_1
  
  cov_2 <- matrix(var_2 * rho, ncol = p, nrow = p)
  diag(cov_2) <- var_2
  
  X1 <- MASS::mvrnorm(n = n, mu = rep(mu_1, p), Sigma = cov_1)
  X2 <- MASS::mvrnorm(n = n, mu = rep(mu_2, p), Sigma = cov_2)
  
  bind_rows(
    as_tibble(X1),
    as_tibble(X2)
  ) |> 
    mutate(
      group = rep(c(1, 2), each = n),
      subject = 1:(n*2),
      .before = everything()
    ) |> 
    rename_with(\(names) str_remove(names, "V"), matches("V")) |> 
    pivot_longer(!c(subject, group), names_to = "t", values_to = "y")
}

df_sims <- params |> 
  rowwise() |> 
  mutate(data = list(sim(n, p, var_1, var_2, rho, mu_1, mu_2)))
df_sims
df_sims$data[[1]]

write_sims <- function(n, p, var_1, var_2, rho, mu_1, mu_2, data) {
  file_name <- glue::glue(
    "{n}n_{p}p_{var_1}var1_{var_2}var2_{rho}rho_{mu_1}mu1_{mu_2}mu2.csv"
  )
  write_csv(data, here::here("data/sims", file_name))
}

df_sims |> 
  unnest(data) |> 
  nest(data = c(k, group, subject, t, y)) |> 
  pwalk(write_sims)

.../data/sims/10n_3p_1var1_1var2_0.5rho_0mu1_0mu2.csv

%MACRO runsims(n=, p=, var_1=, var_2=, rho=, mu_1=, mu_2=);

* Create file name for reading and writing based on parameter values;
%LET file = &n.n_&p.p_&var_1.var1_&var_2.var2_&rho.rho_&mu_1.mu1_&mu_2.mu2.csv;

PROC IMPORT datafile = "W:\James_Otto\JSM-2024\data\sims\&file"
  out = work.data_temp
  replace
  dbms=CSV;
run;

PROC MIXED data=data_temp alpha=.05;

  by k;

  class group subject t;
  model y = group;
  random t / type = CS subject = subject group = group;

  lsmeans group;
  estimate 'mean_difference' group 1 -1 / CL;

  ods output estimates = tempestimates;

run;

data tempestimates;
  set tempestimates;
  n=&n;
  p=&p;
  var_1=&var_1;
  var_2=&var_2;
  rho=&rho;
  mu_1=&mu_1;
  mu_2=&mu_2;
run;

PROC EXPORT data=tempestimates
  outfile = "W:\James_Otto\JSM-2024\data\res\&file"
  dbms = csv
  replace;
run;

%MEND runsims;

data params;
  do n = 10, 20, 30;
  do p = 3, 6;
  do var_1 = 1, 2;
  do var_2 = 1;
  do rho = 0, .5, .8;
  do mu_1 = 0, 1, 5;
  do mu_2 = 0;
    output;
  end; end; end; end; end; end; end;
run;

proc sql noprint;
  select catt('%runsims(',
        'n=', n, ',',
        'p=', p, ',',
        'var_1=', var_1, ',', 
        'var_2=', var_2, ',',
        'rho=', rho, ',',
        'mu_1=', mu_1, ',',
        'mu_2=', mu_2,
      ')') 
    into :mac_calls separated by ' '
  from params;
quit;

&mac_calls;

%put _user_;
...
GLOBAL MAC_CALLS 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0)
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=10, p=3, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=10, p=3, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=10, p=6, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=20, p=3, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=20, p=6, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=30, p=3, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=1, var_2=1, rho=0.8, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.5, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.5, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.5, mu_1=5, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.8, mu_1=0, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.8, mu_1=1, mu_2=0) 
%runsims(n=30, p=6, var_1=2, var_2=1, rho=0.8, mu_1=5, mu_2=0) 

df_sims <-
  fs::dir_ls(here::here("data/res")) |> 
  map_dfr(read_csv, col_types = cols(Probt = col_character())) |> 
  mutate(
    Probt = if_else(Probt == "<.0001", "0", Probt),
    Probt = as.numeric(Probt)
  ) |> 
  select(
    n, p, var_1, var_2, rho, mu_1, mu_2, k,
    Estimate, StdErr, Probt, Lower, Upper
  ) |> 
  left_join(x = df_sims)

df_sims |> 
  filter(var_2 == 1, mu_1 == 1, mu_2 == 0) |> 
  group_by(var_1, rho, p, n) |> 
  summarize(power = mean(Probt < .05)) |> 
  mutate( # Fixing names for ggplot labels
    var_1 = glue::glue("sigma[1]^2 == {var_1}"),
    p = glue::glue("p == {p}"),
    rho = factor(rho, c(0, .5, .8))
  ) |> 
  ggplot(aes(x = n, y = power, color = rho, group = rho)) +
  geom_hline(linetype = "dashed", yintercept = .9) + # Target power
  geom_line(linewidth = 2) +
  facet_grid(p ~ var_1, labeller = label_parsed) +
  scale_color_brewer(palette = 3) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = expression("Comparing Empirical Power for two-sided test of" ~ 
      mu[1] - mu[2] != 0  ~ (alpha == .05)
    ),
    subtitle = expression("Fixing" ~
      sigma[2]^2 == 1 ~ "," ~
      mu[1] == 1 ~ "," ~ mu[2] == 0
    ),
    y = "Empirical Power",
    color = expression(rho)
  ) +
  theme_bw(18)

df_sims |> 
  filter(var_1 == 1, var_2 == 1, mu_1 == 5, mu_2 == 0) |> 
  mutate( # Fixing names for ggplot labels
    n = glue::glue("n == {n}"),
    p = glue::glue("p == {p}"),
    rho = factor(rho, c(0, .5, .8))
  ) |> 
  ggplot(aes(y = Lower, x = rho, color = rho)) +
  geom_hline(yintercept = 4, linetype = "dashed") + # Superiority margin
  geom_boxplot(show.legend = FALSE) +
  scale_color_brewer(palette = 3) +
  labs(
    title = "Comparing Lower 95% Confidence Limit to Superiority Margin of 4",
    subtitle = expression("Fixing" ~
      {sigma[1]^2 == sigma[2]^2} == 1 ~ "," ~
      mu[1] == 5 ~ "," ~ mu[2] == 0
    ),
    x = expression(rho),
    y = expression("Lower 95% Limit for" ~ mu[1] - mu[2])
  ) +
  facet_grid(p ~ n, labeller = label_parsed) +
  theme_bw(18)

Pairing R and SAS with analysis-specific packages

Export functions for:

  • Simulating data
  • Modeling data
  • Summarizing results

As R packages, these have a predictable structure, are portable, and are easy to collaborate on

Example from Alcon

Study-Specific Considerations

  • Evaluating new drug formulation, parameter estimates from legacy studies informing assumptions
  • Comparing multiple designs (contralateral vs bilateral)
  • Investigating power for multiple endpoints
  • Interested in both superiority and non-inferiority

PackageName/
├─── R/
|    ├─── sim-data.R
|    ├─── write-sims.R
|    ├─── read-res.R
|    └─── summarize-res.R
├─── data/
|    └─── *.rda
├─── data-raw/
|    ├─── *.sas7bdat
|    ├─── *.csv
|    ├─── *.R
|    └─── *.SAS
├─── inst/
|    └─── SAS/
|         ├─── runsims-bilateral.SAS
|         └─── runsims-contralateral.SAS
├─── man/
├─── tests/
├─── .Rbuildignore
├─── .gitignore
├─── DESCRIPTION
|     ...
└─── PackageName.Rproj

See also

References

Kuhn M, Wickham H. 2020. Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles.
Linner S, Moreira Lara I, Lehmann K. 2022. tidyMC: Monte carlo simulations made easy and tidy.
Robinson D, Hayes A, Couch S. 2023. Broom: Convert statistical objects into tidy tibbles.
Rosenbloom MFO, Carpenter A. 2013. Macro quoting to the rescue: Passing special characters In: SAS global forum. https://support.sas.com/resources/papers/proceedings13/005-2013.pdf; SAS,.
Rosenbloom MFO, Carpenter A. 2015. Are you a control freak? Control your programs – don’t let them control you! In: SAS global forum. https://support.sas.com/resources/papers/proceedings15/2220-2015.pdf; SAS,.
Wickham H. 2014. Tidy Data. Journal of statistical software 59: 1–23.
Wickham H. 2019. Advanced R. Chapman; Hall/CRC.
Wickham H. 2023. Multidplyr: A multi-process ’dplyr’ backend.
Wickham H, Averick M, Bryan J, et al. 2019. Welcome to the tidyverse. Journal of Open Source Software 4: 1686.
Wickham H, Bryan J. 2023. R Packages. O’Reilly Media, Inc.

Thank you!


jamesotto852.github.io/JSM-2024