Package 'sprintr' reference manual

Title:	Sparse Reluctant Interaction Modeling
Description:	An implementation of a computationally efficient method to fit large-scale interaction models based on the reluctant interaction selection principle. The method and its properties are described in greater depth in Yu, G., Bien, J., and Tibshirani, R.J. (2019) "Reluctant interaction modeling", which is available at <arXiv:1907.08414>.
Authors:	Guo Yu [aut, cre]
Maintainer:	Guo Yu <[email protected]>
License:	GPL-3
Version:	0.9.1
Built:	2025-03-03 03:33:00 UTC
Source:	https://github.com/hugogogo/sprintr

Running sprinter with cross-validation

Description

The main cross-validation function to select the best sprinter fit for a path of tuning parameters.

Usage

cv.sprinter(
  x,
  y,
  square = FALSE,
  num_keep = NULL,
  lambda1 = NULL,
  lambda3 = NULL,
  cv_step1 = FALSE,
  nlam1 = 10,
  nlam3 = 100,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  verbose = FALSE,
  ...
)
cv.sprinter(
  x,
  y,
  square = FALSE,
  num_keep = NULL,
  lambda1 = NULL,
  lambda3 = NULL,
  cv_step1 = FALSE,
  nlam1 = 10,
  nlam3 = 100,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  verbose = FALSE,
  ...
)

Arguments

`x`	An `n` by `p` design matrix of main effects. Each row is an observation of `p` main effects.
`y`	A response vector of size `n`.
`square`	Indicator of whether squared effects should be fitted in Step 1. Default to be FALSE.
`num_keep`	A user specified number of candidate interactions to keep in Step 2. If `num_keep` is not specified (as default), it will be set to `round[n / log n]`.
`lambda1`	Tuning parameter values for Step 1. `lambda1` is a vector. Default to be NULL, and the program will compute its own `lambda1` based on `nlam1` and `lam_min_ratio`.
`lambda3`	Tuning parameter values for Step 3. `lambda3` is a matrix, where the k-th column is the list of tuning parameter in Step 3 corresponding to Step 1 using `lambda1[k]`. Default to be NULL, and the program will compute its own `lambda3` based on `nlam3` and `lam_min_ratio`.
`cv_step1`	Indicator of whether cross-validation of `lambda1` should be carried out in Step 1 before subsequent steps. Default is `FALSE`.
`nlam1`	the number of values in `lambda1`. If not specified, they will be all set to `10`.
`nlam3`	the number of values in each column of `lambda3`. If not specified, they will be all set to `100`.
`lam_min_ratio`	The ratio of the smallest and the largest values in `lambda1` and each column of `lambda2`. The largest value is usually the smallest value for which all coefficients are set to zero. Default to be `1e-2` in the `n` < `p` setting.
`nfold`	Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal `nfold = 3` is used.
`foldid`	A vector of length `n` representing which fold each observation belongs to. Default to be `NULL`, and the program will generate its own randomly.
`verbose`	If `TRUE`, a progress bar shows the progress of the fitting.
`...`	other arguments to be passed to the `glmnet` calls, such as `alpha` or `penalty.factor`

Value

An object of S3 class "sprinter".

n: The sample size.
p: The number of main effects.
square: The square parameter passed into sprinter.
a0_step3: Estimate of intercept corresponding to the CV-selected model.
compact: A compact representation of the selected variables. compact has three columns, with the first two columns representing the indices of a selected variable (main effects with first index = 0), and the last column representing the estimate of coefficients.
fit: The whole glmnet fit object.
fitted: fitted value of response corresponding to the CV-selected model.
num_keep: The value of num_keep.
cvm: The averaged estimated prediction error on the test sets over K folds.
cvse: The standard error of the estimated prediction error on the test sets over K folds.
foldid: Fold assignment. A vector of length n.
i_lambda1_best: The index in lambda1 that is chosen by CV by minimizing cvm.
i_lambda3_best: The index in lambda3 that is chosen by CV by minimizing cvm.
lambda1_best: The value of lambda1 that is chosen by CV by minimizing cvm.
lambda3_best: The value of lambda3 that is chosen by CV by minimizing cvm.
call: Function call.

Examples

n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)

n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)

Two-stage hierarchical lasso

Description

An implementation of the two-stage lasso studied in Hao et, al (2018).

Usage

hier_lasso(
  x,
  y,
  lambda = NULL,
  nlam = 100,
  lam_choice = "min",
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  ...
)
hier_lasso(
  x,
  y,
  lambda = NULL,
  nlam = 100,
  lam_choice = "min",
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  ...
)

Arguments

`x`	An `n` by `p` design matrix of main effects. Each row is an observation of `p` main effects.
`y`	A response vector of size `n`.
`...`	other arguments to be passed to the `glmnet` calls, such as `alpha` or `penalty.factor`

Value

An object of S3 class "cv.hier".

n: The sample size.
p: The number of main effects.
fit: The whole cv.glmnet fit object.
compact: A compact representation of the selected variables. compact has three columns, with the first two columns representing the indices of a selected variable (main effects with first index = 0), and the last column representing the estimate of coefficients.

Examples

set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- hier_lasso(x = x, y = y)

set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- hier_lasso(x = x, y = y)

Plot function of cv.sprinter fit

Description

This function produces plots of cross-validation for cv.sprinter.

Usage

## S3 method for class 'cv.sprinter'
plot(fit)
## S3 method for class 'cv.sprinter'
plot(fit)

Arguments

fit

A "cv.sprinter" object.

Details

The orange pairs on the top of the plot shows the number of non-zero (main effects, interactions) selected by each value of lambda. Adopted from the function plot.cv.rgam from package relgam by Kenneth Tay and Robert Tibshirani.

Examples

set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)

plot(mod)
set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)

plot(mod)

Plot function of sprinter fit

Description

Produces a two-panel plot of the sprinter object showing coefficient paths for both main effects and interactions.

Usage

## S3 method for class 'sprinter'
plot(fit, which = 1, label = TRUE, index = NULL)
## S3 method for class 'sprinter'
plot(fit, which = 1, label = TRUE, index = NULL)

Arguments

`fit`	Fitted `sprinter` object.
`which`	The tuning parameter considered in Step 2.
`label`	If `TRUE` (default), annotate the plot with variable labels.
`index`	Lambda indices to plot

Details

A two panel plot is produced, that summarizes the main effects (left) and interaction (right) coefficients, as a function of lambda. Adopted from the function summary.rgam from package relgam by Kenneth Tay and Robert Tibshirani.

Examples

set.seed(123)
n <- 100
p <- 100
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
fit <- sprinter(x = x, y = y)

plot(fit)

set.seed(123)
n <- 100
p <- 100
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
fit <- sprinter(x = x, y = y)

plot(fit)

Calculate prediction from a `cv.sprinter` object.

Description

Calculate prediction from a cv.sprinter object.

Usage

## S3 method for class 'cv.sprinter'
predict(object, newdata, ...)
## S3 method for class 'cv.sprinter'
predict(object, newdata, ...)

Arguments

`object`	a fitted `cv.sprinter` object.
`newdata`	a design matrix of all the `p` main effects of some new observations of which predictions are to be made.
`...`	additional argument (not used here, only for S3 generic/method consistency)

Value

The prediction of newdata by the cv.sprinter fit object.

Examples

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

Calculate prediction from a `other` object.

Description

Calculate prediction from a other object.

Usage

## S3 method for class 'other'
predict(object, newdata, ...)
## S3 method for class 'other'
predict(object, newdata, ...)

Arguments

`object`	a fitted `other` object.
`newdata`	a design matrix of all the `p` main effects of some new observations of which predictions are to be made.
`...`	additional argument (not used here, only for S3 generic/method consistency)

Value

The prediction of newdata by the cv.sprinter fit object.

Examples

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

Calculate prediction from a `sprinter` object.

Description

Calculate prediction from a sprinter object.

Usage

## S3 method for class 'sprinter'
predict(object, newdata, ...)
## S3 method for class 'sprinter'
predict(object, newdata, ...)

Arguments

`object`	a fitted `sprinter` object.
`newdata`	a design matrix of all the `p` main effects of some new observations of which predictions are to be made.
`...`	additional argument (not used here, only for S3 generic/method consistency)

Value

The prediction of newdata by the sprinter fit object.

Examples

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

n <- 100
p <- 200
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] + 2 * x[, 2] - 3 * x[, 1] * x[, 2] + rnorm(n)
mod <- sprinter(x = x, y = y)
fitted <- predict(mod, newdata = x)

Print the cross validation information of cv.sprinter

Description

Print a summary of the cross-validation information for running cv.sprinter.

Usage

## S3 method for class 'cv.sprinter'
print(fit, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'cv.sprinter'
print(fit, digits = max(3, getOption("digits") - 3), ...)

Arguments

`fit`	A fitted `cv.sprinter` object.
`digits`	Significant digits in printout.

Details

This function takes in a cv.sprinter object and produces summary of the cross-validation informationabout the tuning parameters (in Step 3) selected by lambda.min and lambda.1se. Adopted from the function print.cv.rgam from package relgam by Kenneth Tay and Robert Tibshirani.

Examples

set.seed(123)
n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)

fit.cv <- cv.sprinter(x = x, y = y)
print(fit.cv)

set.seed(123)
n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)

fit.cv <- cv.sprinter(x = x, y = y)
print(fit.cv)

Print a summary of the sprinter fit

Description

Print a summary of the sprinter fit at each step along the path of tuning parameters used in Step 3, for any given tuning parameter in Step 1.

Usage

## S3 method for class 'sprinter'
print(fit, which = 1, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'sprinter'
print(fit, which = 1, digits = max(3, getOption("digits") - 3), ...)

Arguments

`fit`	A `sprinter` object.
`which`	Which tuning parameter of Step 1 to print. Default is 1.
`digits`	Significant digits in printout.
`...`	Additional print arguments.

Details

The function produces a three-column matrix with tuning parameter values (in Step 3), number of nonzero main effects, and the number of nonzero interactions. Adopted from the function print.rgam from package relgam by Kenneth Tay and Robert Tibshirani.

Examples

set.seed(123)
n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
fit <- sprinter(x = x, y = y)

print(fit, which = 3)

set.seed(123)
n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
fit <- sprinter(x = x, y = y)

print(fit, which = 3)

Sure Independence Screening in Step 2

Description

Sure Independence Screening in Step 2

Usage

screen_cpp(x, y, num_keep, square = FALSE, main_effect = FALSE)
screen_cpp(x, y, num_keep, square = FALSE, main_effect = FALSE)

Arguments

`x`	a n-by-p matrix of main effects, with i.i.d rows, and each row represents a vector of observations of p main-effects
`y`	a vector of length n. In sprinter, y is the residual from step 1
`num_keep`	the number of candidate interactions in Step 2. Default to be n / [log n]
`square`	An indicator of whether squared effects should be considered in Step 1 (NOT Step 2!). square == TRUE if squared effects have been considered in Step 1, i.e., squared effects will NOT be considered in Step 2.
`main_effect`	An indicator of whether main effects should also be screened. Default to be false. The functionality of main_effect = true is not used in sprinter, but for SIS_lasso.

Value

an matrix of 3 columns, representing the index pair of the selected interactions, and the corresponding absolute correlation with the residual.

Sure Independence Screening in Step 2 for sparse design matrix

Description

Sure Independence Screening in Step 2 for sparse design matrix

Usage

screen_sparse_cpp(x, y, num_keep, square = FALSE, main_effect = FALSE)
screen_sparse_cpp(x, y, num_keep, square = FALSE, main_effect = FALSE)

Arguments

`x`	a n-by-p sparse matrix of main effects
`y`	a vector of length n. In sprinter, y is the residual from step 1
`num_keep`	the number of candidate interactions in Step 2. Default to be n / [log n]
`square`	An indicator of whether squared effects should be considered in Step 1 (NOT Step 2!). square == TRUE if squared effects have been considered in Step 1, i.e., squared effects will NOT be considered in Step 2.
`main_effect`	An indicator of whether main effects should also be screened. Default to be false. The functionality of main_effect = true is not used in sprinter, but for SIS_lasso.

Value

an matrix of 3 columns, representing the index pair of the selected interactions, and the corresponding absolute correlation with the residual.

Sure independence screening followed by lasso

Description

Sure independence screening followed by lasso

Usage

sis_lasso(
  x,
  y,
  num_keep = NULL,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  ...
)
sis_lasso(
  x,
  y,
  num_keep = NULL,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  ...
)

Arguments

`x`	An `n` by `p` design matrix of main effects. Each row is an observation of `p` main effects.
`y`	A response vector of size `n`.
`num_keep`	Number of variables to keep in the screening phase
`...`	other arguments to be passed to the `glmnet` calls, such as `alpha` or `penalty.factor`

Value

An object of S3 class "cv.hier".

n: The sample size.
p: The number of main effects.
fit: The whole cv.glmnet fit object.
compact: A compact representation of the selected variables. compact has three columns, with the first two columns representing the indices of a selected variable (main effects with first index = 0), and the last column representing the estimate of coefficients.

Examples

set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- hier_lasso(x = x, y = y)

set.seed(123)
n <- 100
p <- 200
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- hier_lasso(x = x, y = y)

Reluctant Interaction Modeling

Description

This is the main function that fits interaction models with a path of tuning parameters (for Step 3).

Usage

sprinter(
  x,
  y,
  square = FALSE,
  num_keep = NULL,
  lambda1 = NULL,
  lambda3 = NULL,
  cv_step1 = FALSE,
  nlam1 = 10,
  nlam3 = 100,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  ...
)
sprinter(
  x,
  y,
  square = FALSE,
  num_keep = NULL,
  lambda1 = NULL,
  lambda3 = NULL,
  cv_step1 = FALSE,
  nlam1 = 10,
  nlam3 = 100,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  ...
)

Arguments

`x`	An `n` by `p` design matrix of main effects. Each row is an observation of `p` main effects.
`y`	A response vector of size `n`.
`square`	Indicator of whether squared effects should be fitted in Step 1. Default to be FALSE.
`num_keep`	A user specified number of candidate interactions to keep in Step 2. If `num_keep` is not specified (as default), it will be set to `round[n / log n]`.
`lambda1`	Tuning parameter values for Step 1. `lambda1` is a vector. Default to be NULL, and the program will compute its own `lambda1` based on `nlam1` and `lam_min_ratio`.
`lambda3`	Tuning parameter values for Step 3. `lambda3` is a matrix, where the k-th column is the list of tuning parameter in Step 3 corresponding to Step 1 using `lambda1[k]`. Default to be NULL, and the program will compute its own `lambda3` based on `nlam3` and `lam_min_ratio`.
`cv_step1`	Indicator of whether cross-validation of `lambda1` should be carried out in Step 1 before subsequent steps. Default is `FALSE`.
`nlam1`	the number of values in `lambda1`. If not specified, they will be all set to `10`.
`nlam3`	the number of values in each column of `lambda3`. If not specified, they will be all set to `100`.
`lam_min_ratio`	The ratio of the smallest and the largest values in `lambda1` and each column of `lambda2`. The largest value is usually the smallest value for which all coefficients are set to zero. Default to be `1e-2` in the `n` < `p` setting.
`...`	other arguments to be passed to the `glmnet` calls, such as `alpha` or `penalty.factor`

Value

An object of S3 class "sprinter".

square: The square parameter passed into sprinter
n: The number of observations in the dataset
p: The number of main effects
step1: The output from fitting Step 1
lambda1: The path of tuning parameters passed into / computed for fitting Step 1
step2: The output from the screening Step 2
num_keep: The path of tuning parameters for Step 2
step3: The output from fitting Step 3
lambda3: The path of tuning parameters passed into / computed for fitting Step 3
main_center: Column centers of the input main effects
main_scale: Column scales of the input main effects
call: Function call.

Examples

set.seed(123)
n <- 100
p <- 100
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- sprinter(x = x, y = y)

# sparse input
library(Matrix)
x <- Matrix::Matrix(0, n, p)
idx <- cbind(sample(seq(n), size = 10, replace = TRUE), sample(seq(p), size = 10, replace = TRUE))
x[idx] <- 1
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- sprinter(x = x, y = y)

set.seed(123)
n <- 100
p <- 100
# dense input
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- sprinter(x = x, y = y)

# sparse input
library(Matrix)
x <- Matrix::Matrix(0, n, p)
idx <- cbind(sample(seq(n), size = 10, replace = TRUE), sample(seq(p), size = 10, replace = TRUE))
x[idx] <- 1
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- sprinter(x = x, y = y)

Package 'sprintr'

Help Index

Running sprinter with cross-validation

Description

Usage

Arguments

Value

See Also

Examples

Two-stage hierarchical lasso

Description

Usage

Arguments

Value

Examples

Plot function of cv.sprinter fit

Description

Usage

Arguments

Details

See Also

Examples

Plot function of sprinter fit

Description

Usage

Arguments

Details

Examples

Calculate prediction from a cv.sprinter object.

Description

Usage

Arguments

Value

Examples

Calculate prediction from a other object.

Description

Usage

Arguments

Value

Examples

Calculate prediction from a sprinter object.

Description

Usage

Arguments

Value

Examples

Print the cross validation information of cv.sprinter

Description

Usage

Arguments

Details

See Also

Examples

Print a summary of the sprinter fit

Description

Usage

Arguments

Details

See Also

Examples

Sure Independence Screening in Step 2

Description

Usage

Arguments

Value

Sure Independence Screening in Step 2 for sparse design matrix

Description

Usage

Arguments

Value

Sure independence screening followed by lasso

Description

Usage

Arguments

Value

Examples

Reluctant Interaction Modeling

Description

Usage

Arguments

Calculate prediction from a `cv.sprinter` object.

Calculate prediction from a `other` object.

Calculate prediction from a `sprinter` object.