Title: | Estimating the Error Variance in a High-Dimensional Linear Model |
---|---|
Description: | Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <arXiv:1712.02412>. |
Authors: | Guo Yu [aut, cre] |
Maintainer: | Guo Yu <[email protected]> |
License: | GPL-3 |
Version: | 0.9.0 |
Built: | 2024-11-25 04:09:58 UTC |
Source: | https://github.com/hugogogo/natural |
Get the two (theoretical) values of lambdas used in the organic lasso
getLam_olasso(x)
getLam_olasso(x)
x |
design matrix |
Get the two (theoretical) values of lambdas used in scaled lasso
getLam_slasso(n, p)
getLam_slasso(n, p)
n |
number of observations |
p |
number of features |
Generate design matrix and response following linear models
, where
, and
.
make_sparse_model(n, p, alpha, rho, snr, nsim)
make_sparse_model(n, p, alpha, rho, snr, nsim)
n |
the sample size |
p |
the number of features |
alpha |
sparsity, i.e., |
rho |
pairwise correlation among features |
snr |
signal to noise ratio, defined as |
nsim |
the number of simulations |
A list object containing:
x
: The n
by p
design matrix
y
: The n
by nsim
matrix of response vector, each column representing one replication of the simulation
beta
: The true regression coefficient vector
sigma
: The true error standard deviation
The package contains implementation of the two methods introduced in Yu, Bien (2017) https://arxiv.org/abs/1712.02412.
The main functions are nlasso_cv
, olasso_cv
, and olasso
.
Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08, glmnet_output = NULL)
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08, glmnet_output = NULL)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
A list object containing:
n
and p
: The dimension of the problem.
lambda
: The path of tuning parameter used.
beta
: Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0
: Estimate of intercept
mat_mse
: The estimated prediction error on the test sets in cross-validation. A matrix of size nlam
by nfold
. If glmnet_output
is not NULL
, then mat_mse
will be NULL.
cvm
: The averaged estimated prediction error on the test sets over K folds.
cvse
: The standard error of the estimated prediction error on the test sets over K folds.
ibest
: The index in lambda
that attains the minimal mean cross-validated error.
foldid
: Fold assignment. A vector of length n
.
nfold
: The number of folds used in cross-validation.
sig_obj
: Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.
sig_obj_path
: Natural lasso estimates of standard deviation of the error. A vector of length nlam
.
sig_naive
: Naive estimates of the error standard deviation based on lasso regression, i.e., , selected by cross-validation.
sig_naive_path
: Naive estimate of standard deviation of the error based on lasso regression. A vector of length nlam
.
sig_df
: Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).
sig_df_path
: Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam
.
type
: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
Calculate a solution path of the natural lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the lasso problems and returns the lasso objective function values as estimates of the error variance:
The output also includes a path of naive estimates and a path of degree of freedom adjusted estimates of the error standard deviation.
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for the underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
A list object containing:
n
and p
: The dimension of the problem.
lambda
: The path of tuning parameters used.
beta
: Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size p
by nlam
. The j
-th column represents the estimate of coefficient corresponding to the j
-th tuning parameter in lambda
.
a0
: Estimate of intercept. A vector of length nlam
.
sig_obj_path
: Natural lasso estimates of the error standard deviation. A vector of length nlam
.
sig_naive_path
: Naive estimates of the error standard deviation based on lasso regression, i.e., . A vector of length
nlam
.
sig_df_path
: Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length nlam
. See Reid, et, al (2016).
type
: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
Solve the organic lasso problem
with two pre-specified values of tuning parameter:
, and
, which is a Monte-Carlo estimate of
, where
is n-dimensional standard normal.
olasso(x, y, intercept = TRUE, thresh = 1e-08)
olasso(x, y, intercept = TRUE, thresh = 1e-08)
x |
An |
y |
A response vector of size |
intercept |
Indicator of whether intercept should be fitted. Default to be |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
A list object containing:
n
and p
: The dimension of the problem.
lam_1
, lam_2
: , and an Monte-Carlo estimate of
, where
is n-dimensional standard normal.
a0_1
, a0_2
: Estimate of intercept, corresponding to lam_1
and lam_2
.
beta_1
, beta_2
: Organic lasso estimate of regression coefficients, corresponding to lam_1
and lam_2
.
sig_obj_1
, sig_obj_2
: Organic lasso estimate of the error standard deviation, corresponding to lam_1
and lam_2
.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol <- olasso(x = sim$x, y = sim$y[, 1])
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol <- olasso(x = sim$x, y = sim$y[, 1])
Provide organic lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100, flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
A list object containing:
n
and p
: The dimension of the problem.
lambda
: The path of tuning parameter used.
beta
: Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0
: Estimate of intercept
mat_mse
: The estimated prediction error on the test sets in cross-validation. A matrix of size nlam
by nfold
cvm
: The averaged estimated prediction error on the test sets over K folds.
cvse
: The standard error of the estimated prediction error on the test sets over K folds.
ibest
: The index in lambda
that attains the minimal mean cross-validated error.
foldid
: Fold assignment. A vector of length n
.
nfold
: The number of folds used in cross-validation.
sig_obj
: Organic lasso estimate of the error standard deviation, selected by cross-validation.
sig_obj_path
: Organic lasso estimates of the error standard deviation. A vector of length nlam
.
type
: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
Calculate a solution path of the organic lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the squared-lasso problems and returns the objective function values as estimates of the error variance:
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE)
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01, thresh = 1e-08, intercept = TRUE)
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
This package also includes the outputs of the naive and the degree-of-freedom adjusted estimates, in analogy to nlasso_path
.
A list object containing:
n
and p
: The dimension of the problem.
lambda
: The path of tuning parameter used.
a0
: Estimate of intercept. A vector of length nlam
.
beta
: Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size p
by nlam
. The j
-th column represents the estimate of coefficient corresponding to the j
-th tuning parameter in lambda
.
sig_obj_path
: Organic lasso estimates of the error standard deviation. A vector of length nlam
.
sig_naive
: Naive estimate of the error standard deviation based on the squared-lasso regression. A vector of length nlam
.
sig_df
: Degree-of-freedom adjusted estimate of the error standard deviation, based on the squared-lasso regression. A vector of length nlam
.
type
: whether the output is of a natural or an organic lasso.
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
set.seed(123) sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1) ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
olasso_slow(x, y, thresh = 1e-08)
olasso_slow(x, y, thresh = 1e-08)
x |
An |
y |
A response vector of size |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
This function is adapted from the ggb R package.
## S3 method for class 'natural.cv' plot(x, ...)
## S3 method for class 'natural.cv' plot(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
This function is adapted from the ggb R package.
## S3 method for class 'natural.path' plot(x, ...)
## S3 method for class 'natural.path' plot(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
This function is adapted from the ggb R package.
## S3 method for class 'natural.path' print(x, ...)
## S3 method for class 'natural.path' print(x, ...)
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
standardize(x, center = TRUE)
standardize(x, center = TRUE)
x |
design matrix |
center |
should we set column means equal to zero |