| Title: | Estimating the Error Variance in a High-Dimensional Linear Model |
|---|---|
| Description: | Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <arXiv:1712.02412>. |
| Authors: | Guo Yu [aut, cre] |
| Maintainer: | Guo Yu <[email protected]> |
| License: | GPL-3 |
| Version: | 0.9.0 |
| Built: | 2026-05-08 05:56:54 UTC |
| Source: | https://github.com/hugogogo/natural |
Get the two (theoretical) values of lambdas used in the organic lasso
getLam_olasso(x)getLam_olasso(x)
x |
design matrix |
Get the two (theoretical) values of lambdas used in scaled lasso
getLam_slasso(n, p)getLam_slasso(n, p)
n |
number of observations |
p |
number of features |
Generate design matrix and response following linear models
, where
, and .
make_sparse_model(n, p, alpha, rho, snr, nsim)make_sparse_model(n, p, alpha, rho, snr, nsim)
n |
the sample size |
p |
the number of features |
alpha |
sparsity, i.e., |
rho |
pairwise correlation among features |
snr |
signal to noise ratio, defined as |
nsim |
the number of simulations |
A list object containing:
x: The n by p design matrix
y: The n by nsim matrix of response vector, each column representing one replication of the simulation
beta: The true regression coefficient vector
sigma: The true error standard deviation
The package contains implementation of the two methods introduced in Yu, Bien (2017) https://arxiv.org/abs/1712.02412.
The main functions are nlasso_cv, olasso_cv, and olasso.
Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08, glmnet_output = NULL)nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08, glmnet_output = NULL)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
A list object containing:
n and p: The dimension of the problem.
lambda: The path of tuning parameter used.
beta: Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0: Estimate of intercept
mat_mse: The estimated prediction error on the test sets in cross-validation. A matrix of size nlam by nfold. If glmnet_output is not NULL, then mat_mse will be NULL.
cvm: The averaged estimated prediction error on the test sets over K folds.
cvse: The standard error of the estimated prediction error on the test sets over K folds.
ibest: The index in lambda that attains the minimal mean cross-validated error.
foldid: Fold assignment. A vector of length n.
nfold: The number of folds used in cross-validation.
sig_obj: Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.
sig_obj_path: Natural lasso estimates of standard deviation of the error. A vector of length nlam.
sig_naive: Naive estimates of the error standard deviation based on lasso regression, i.e., , selected by cross-validation.
sig_naive_path: Naive estimate of standard deviation of the error based on lasso regression. A vector of length nlam.
sig_df: Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).
sig_df_path: Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam.
type: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
Calculate a solution path of the natural lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the lasso problems and returns the lasso objective function values as estimates of the error variance:
The output also includes a path of naive estimates and a path of degree of freedom adjusted estimates of the error standard deviation.
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for the underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
A list object containing:
n and p: The dimension of the problem.
lambda: The path of tuning parameters used.
beta: Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size p by nlam. The j-th column represents the estimate of coefficient corresponding to the j-th tuning parameter in lambda.
a0: Estimate of intercept. A vector of length nlam.
sig_obj_path: Natural lasso estimates of the error standard deviation. A vector of length nlam.
sig_naive_path: Naive estimates of the error standard deviation based on lasso regression, i.e., . A vector of length nlam.
sig_df_path: Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam. See Reid, et, al (2016).
type: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
Solve the organic lasso problem
with two pre-specified values of tuning parameter:
, and , which is a Monte-Carlo estimate of , where is n-dimensional standard normal.
olasso(x, y, intercept = TRUE, thresh = 1e-08)olasso(x, y, intercept = TRUE, thresh = 1e-08)
x |
An |
y |
A response vector of size |
intercept |
Indicator of whether intercept should be fitted. Default to be |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
A list object containing:
n and p: The dimension of the problem.
lam_1, lam_2: , and an Monte-Carlo estimate of , where is n-dimensional standard normal.
a0_1, a0_2: Estimate of intercept, corresponding to lam_1 and lam_2.
beta_1, beta_2: Organic lasso estimate of regression coefficients, corresponding to lam_1 and lam_2.
sig_obj_1, sig_obj_2: Organic lasso estimate of the error standard deviation, corresponding to lam_1 and lam_2.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol <- olasso(x = sim$x, y = sim$y[, 1])set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol <- olasso(x = sim$x, y = sim$y[, 1])
Provide organic lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
A list object containing:
n and p: The dimension of the problem.
lambda: The path of tuning parameter used.
beta: Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0: Estimate of intercept
mat_mse: The estimated prediction error on the test sets in cross-validation. A matrix of size nlam by nfold
cvm: The averaged estimated prediction error on the test sets over K folds.
cvse: The standard error of the estimated prediction error on the test sets over K folds.
ibest: The index in lambda that attains the minimal mean cross-validated error.
foldid: Fold assignment. A vector of length n.
nfold: The number of folds used in cross-validation.
sig_obj: Organic lasso estimate of the error standard deviation, selected by cross-validation.
sig_obj_path: Organic lasso estimates of the error standard deviation. A vector of length nlam.
type: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
Calculate a solution path of the organic lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the squared-lasso problems and returns the objective function values as estimates of the error variance:
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE)olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
This package also includes the outputs of the naive and the degree-of-freedom adjusted estimates, in analogy to nlasso_path.
A list object containing:
n and p: The dimension of the problem.
lambda: The path of tuning parameter used.
a0: Estimate of intercept. A vector of length nlam.
beta: Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size p by nlam. The j-th column represents the estimate of coefficient corresponding to the j-th tuning parameter in lambda.
sig_obj_path: Organic lasso estimates of the error standard deviation. A vector of length nlam.
sig_naive: Naive estimate of the error standard deviation based on the squared-lasso regression. A vector of length nlam.
sig_df: Degree-of-freedom adjusted estimate of the error standard deviation, based on the squared-lasso regression. A vector of length nlam.
type: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
olasso_slow(x, y, thresh = 1e-08)olasso_slow(x, y, thresh = 1e-08)
x |
An |
y |
A response vector of size |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
This function is adapted from the ggb R package.
## S3 method for class 'natural.cv' plot(x, ...)## S3 method for class 'natural.cv' plot(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
This function is adapted from the ggb R package.
## S3 method for class 'natural.path' plot(x, ...)## S3 method for class 'natural.path' plot(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
This function is adapted from the ggb R package.
## S3 method for class 'natural.path' print(x, ...)## S3 method for class 'natural.path' print(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
standardize(x, center = TRUE)standardize(x, center = TRUE)
x |
design matrix |
center |
should we set column means equal to zero |