| Title: | Extra Canonical Link Family Objects for Generalized Linear Models |
|---|---|
| Description: | Extra family objects in "weird" scenarios, particularly logistic or log-linear model with unbounded or non-binary/non-integer outcomes. Provides binomial_extra() and poisson_extra() as generalizations of binomial() and poisson(). The use of canonical link with the corresponding working likelihood in glm() ensures convexity, making model fitting reliable and independent of starting value. Robert WM Wedderburn (1974) <doi:10.1093/biomet/61.3.439> and Peter McCullagh (1983) <doi:10.1214/aos/1176346056> justified this method to fit generalized linear (mean) models with quasi-/working likelihood. |
| Authors: | Hongxiang Qiu [aut, cre] |
| Maintainer: | Hongxiang Qiu <[email protected]> |
| License: | GPL (>=2) |
| Version: | 1.0.0 |
| Built: | 2026-05-25 19:43:30 UTC |
| Source: | https://github.com/qiu-hongxiang-david/canonicalfamilyextra |
A family object for fitting binomial models (i.e., generalized linear models with range contained in the open unit interval ) with continuous outcomes that may fall outside the unit interval . Also works with glmnet::glmnet() as well as SL.glmnet.extra() and screen.glmnet.extra(). Generally, the object aims to fit a model ranged in .
binomial_extra( link = "logit", variance = "mu(1-mu)", family = c("gaussian", "binomial") )binomial_extra( link = "logit", variance = "mu(1-mu)", family = c("gaussian", "binomial") )
link |
see |
variance |
see |
family |
The family of the returned family object. Either |
This family is useful, for example, when the estimand is a conditional probability function while the outcome is a transformed pseudo-outcome so that the estimator is multiply robust, or estimating a regression function with known bounds while the outcome might not respect the known bounds. Naive approaches such as glm(family=binomial()), glm(family=quasibinomial()), glm(family=gaussian(link="logit")), glm(family=quasi(link="logit",variance="constant")) etc. might not work appropriately in such cases.
Particularly for logistic model, because of using the binomial working likelihood and its canonical link, the model fitting is a convex problem and does not depend on starting value.
The output has family="gaussian" by default to be compatible with other learners in SuperLearner::SuperLearner(), because when the outcome is continuous, other learners might not perform correctly with family="binomial".
When running geepack::geeglm() or gee::gee() with binomial_extra, the working mean-variance relationship is determined by the family of binomial_extra. Need to specify family="binomial" to use binomial working mean-variance relationship. The default "gaussian" will lead to Gaussian working mean-variance relationship.
a family object
set.seed(123) expit <- binomial()$linkinv # glm x <- rnorm(100) y <- expit(1 + x) + rnorm(100) glm(y~x, family = binomial_extra()) # or family=binomial_extra, or family="binomial_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glm(y~x, family = binomial())) try(glm(y~x, family = quasibinomial())) try(glm(y~x, family = gaussian(link = "logit"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = gaussian(link = "logit"), start = c(-1,0))) try(glm(y~x, family = quasi(link = "logit", variance = "constant"))) #glmnet X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1]) + rnorm(100) require(glmnet) glmnet(X, y, family = binomial_extra()) # or family=binomial_extra; cannot use family="binomial_extra" # Counterexamples: These naive approaches yield errors try(glmnet(X, y, family = binomial())) try(glmnet(X, y, family = gaussian(link = "logit"))) try(glmnet(X, y, family = quasi(link = "logit", variance = "constant"))) # other links/variance for glm x <- rnorm(100) y <- expit(1 + x) + rnorm(100) glm(y~x, family = binomial_extra("probit")) # probit regression glm(y~x, family = binomial_extra(variance = "constant")) # least squares # within SuperLearner X <- matrix(rnorm(100 * 3), nrow = 100) y <- expit(1 + X[,1]) + rnorm(10) require(SuperLearner) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = c("SL.glm", "SL.ipredbagg"), cvControl = list(V = 2)) #GEE x <- rnorm(100) y <- expit(1 + x) + rnorm(100) id <- rep(1:20, each=5) geepack::geeglm(y ~ x, family = binomial_extra(family = "binomial"), id = id) gee::gee(y ~ x, family = binomial_extra(family = "binomial"), id = id)set.seed(123) expit <- binomial()$linkinv # glm x <- rnorm(100) y <- expit(1 + x) + rnorm(100) glm(y~x, family = binomial_extra()) # or family=binomial_extra, or family="binomial_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glm(y~x, family = binomial())) try(glm(y~x, family = quasibinomial())) try(glm(y~x, family = gaussian(link = "logit"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = gaussian(link = "logit"), start = c(-1,0))) try(glm(y~x, family = quasi(link = "logit", variance = "constant"))) #glmnet X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1]) + rnorm(100) require(glmnet) glmnet(X, y, family = binomial_extra()) # or family=binomial_extra; cannot use family="binomial_extra" # Counterexamples: These naive approaches yield errors try(glmnet(X, y, family = binomial())) try(glmnet(X, y, family = gaussian(link = "logit"))) try(glmnet(X, y, family = quasi(link = "logit", variance = "constant"))) # other links/variance for glm x <- rnorm(100) y <- expit(1 + x) + rnorm(100) glm(y~x, family = binomial_extra("probit")) # probit regression glm(y~x, family = binomial_extra(variance = "constant")) # least squares # within SuperLearner X <- matrix(rnorm(100 * 3), nrow = 100) y <- expit(1 + X[,1]) + rnorm(10) require(SuperLearner) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = c("SL.glm", "SL.ipredbagg"), cvControl = list(V = 2)) #GEE x <- rnorm(100) y <- expit(1 + x) + rnorm(100) id <- rep(1:20, each=5) geepack::geeglm(y ~ x, family = binomial_extra(family = "binomial"), id = id) gee::gee(y ~ x, family = binomial_extra(family = "binomial"), id = id)
A family object for fitting Poisson models (i.e., generalized linear models with range contained in the open unit interval ) with continuous outcomes that may not be non-negative integers. Also works with glmnet::glmnet() as well as SL.glmnet.extra() and screen.glmnet.extra(). Generally, the object aims to fit a model ranged in .
poisson_extra(link = "log", variance = "mu", family = c("gaussian", "poisson"))poisson_extra(link = "log", variance = "mu", family = c("gaussian", "poisson"))
link |
see |
variance |
see |
family |
The family of the returned family object. Either |
This family is useful, for example, when the estimand is a conditional probability function while the outcome is a transformed pseudo-outcome so that the estimator is multiply robust, or estimating a positive regression function while the outcome might be negative or non-integers. Naive approaches such as glm(family=poisson()), glm(family=quasipoisson()), glm(family=gaussian(link="log")), glm(family=quasi(link="log",variance="constant")) etc. might not work appropriately or reliably in such cases.
Particularly for log-linear model, because of using the Poisson working likelihood and its canonical link, the model fitting is a convex problem and does not depend on starting value.
The output has family="gaussian" by default to be compatible with other learners in SuperLearner::SuperLearner(), because when the outcome is continuous, other learners might not perform correctly with family="poisson".
When running geepack::geeglm() or gee::gee() with poisson_extra, the working mean-variance relationship is determined by the family of poisson_extra. Need to specify family="poisson" to use Poisson working mean-variance relationship. The default "gaussian" will lead to Gaussian working mean-variance relationship.
a family object
set.seed(123) # glm x <- rnorm(100) y <- exp(-1 + x) + rnorm(100) glm(y~x, family = poisson_extra()) # or family=poisson_extra, or family="poisson_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glm(y~x, family = poisson())) try(glm(y~x, family = quasipoisson())) try(glm(y~x, family = gaussian(link = "log"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = gaussian(link = "log"), start = c(-1,0))) try(glm(y~x, family = quasi(link = "log", variance = "constant"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = quasi(link = "log", variance = "constant"), start = c(-1,0))) #glmnet X <- matrix(rnorm(100 * 5), nrow = 100) y <- exp(1 + X[,1]) + rnorm(100) require(glmnet) glmnet(X, y, family = poisson_extra()) # or family=poisson_extra; cannot use family="poisson_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glmnet(X, y, family = poisson())) try(glmnet(X, y, family = gaussian(link = "log"))) try(glmnet(X, y, family = quasi(link = "log", variance = "constant"))) # within SuperLearner X <- matrix(rnorm(100 * 3), nrow = 100) y <- exp(1 + X[,1]) + rnorm(10) require(SuperLearner) SuperLearner(y, data.frame(X), family=poisson_extra(), SL.library = c("SL.glm", "SL.ipredbagg"), cvControl = list(V = 2)) #GEE x <- rnorm(100) y <- exp(-1 + x) + rnorm(100) id <- rep(1:20, each=5) geepack::geeglm(y ~ x, family = poisson_extra(family = "poisson"), id = id) gee::gee(y ~ x, family = poisson_extra(family = "poisson"), id = id)set.seed(123) # glm x <- rnorm(100) y <- exp(-1 + x) + rnorm(100) glm(y~x, family = poisson_extra()) # or family=poisson_extra, or family="poisson_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glm(y~x, family = poisson())) try(glm(y~x, family = quasipoisson())) try(glm(y~x, family = gaussian(link = "log"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = gaussian(link = "log"), start = c(-1,0))) try(glm(y~x, family = quasi(link = "log", variance = "constant"))) # setting starting value might fix this approach, # but the problem is non-convex and requires valid starting value try(glm(y~x, family = quasi(link = "log", variance = "constant"), start = c(-1,0))) #glmnet X <- matrix(rnorm(100 * 5), nrow = 100) y <- exp(1 + X[,1]) + rnorm(100) require(glmnet) glmnet(X, y, family = poisson_extra()) # or family=poisson_extra; cannot use family="poisson_extra" # Counterexamples: These naive approaches yield errors or are not reliable try(glmnet(X, y, family = poisson())) try(glmnet(X, y, family = gaussian(link = "log"))) try(glmnet(X, y, family = quasi(link = "log", variance = "constant"))) # within SuperLearner X <- matrix(rnorm(100 * 3), nrow = 100) y <- exp(1 + X[,1]) + rnorm(10) require(SuperLearner) SuperLearner(y, data.frame(X), family=poisson_extra(), SL.library = c("SL.glm", "SL.ipredbagg"), cvControl = list(V = 2)) #GEE x <- rnorm(100) y <- exp(-1 + x) + rnorm(100) id <- rep(1:20, each=5) geepack::geeglm(y ~ x, family = poisson_extra(family = "poisson"), id = id) gee::gee(y ~ x, family = poisson_extra(family = "poisson"), id = id)
A wrapper of glmnet::cv.glmnet() similar to SuperLearner::screen.glmnet(), except that family can be binomial_extra. SuperLearner::screen.glmnet() only passes the name of family (e.g., "gaussian", "binomial") and thus cannot pass the full customized families like binomial_extra().
screen.glmnet.extra( Y, X, family, alpha = 1, minscreen = 2, nfolds = 10, nlambda = 100, ... )screen.glmnet.extra( Y, X, family, alpha = 1, minscreen = 2, nfolds = 10, nlambda = 100, ... )
Y |
|
X |
|
family |
similar to |
alpha |
|
minscreen |
|
nfolds |
|
nlambda |
|
... |
A logical vector with the length equal to the number of columns in X. TRUE indicates the variable (column of X) should be included. Similar to the value of SuperLearner::screen.glmnet().
set.seed(321) expit <- binomial()$linkinv X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1] + X[,2]) + rnorm(100) require(SuperLearner) SL.library<-list(c("SL.glmnet.extra","All"), c("SL.glmnet.extra","screen.glmnet.extra")) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = SL.library, cvControl = list(V = 2))set.seed(321) expit <- binomial()$linkinv X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1] + X[,2]) + rnorm(100) require(SuperLearner) SL.library<-list(c("SL.glmnet.extra","All"), c("SL.glmnet.extra","screen.glmnet.extra")) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = SL.library, cvControl = list(V = 2))
A wrapper of glmnet::cv.glmnet() similar to SuperLearner::SL.glmnet(), except that family can be binomial_extra. SuperLearner::SL.glmnet() only passes the name of family and thus cannot pass the full customized families like binomial_extra().
SL.glmnet.extra( Y, X, newX, family, obsWeights, id, alpha = 1, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )SL.glmnet.extra( Y, X, newX, family, obsWeights, id, alpha = 1, nfolds = 10, nlambda = 100, useMin = TRUE, loss = "deviance", ... )
Y |
|
X |
|
newX |
|
family |
similar to |
obsWeights |
|
id |
|
alpha |
|
nfolds |
|
nlambda |
|
useMin |
|
loss |
|
... |
A list with two elements:
The predicted values for the rows in newX.
A list. Contains all objects necessary to get predictions for new observations from specific algorithm.
Similar to the value of SuperLearner::SL.glmnet().
set.seed(321) expit <- binomial()$linkinv X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1] + X[,2]) + rnorm(100) require(SuperLearner) SL.library<-list(c("SL.glmnet.extra","All"), c("SL.glmnet.extra","screen.glmnet.extra")) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = SL.library, cvControl = list(V = 2))set.seed(321) expit <- binomial()$linkinv X <- matrix(rnorm(100 * 5), nrow = 100) y <- expit(1 + X[,1] + X[,2]) + rnorm(100) require(SuperLearner) SL.library<-list(c("SL.glmnet.extra","All"), c("SL.glmnet.extra","screen.glmnet.extra")) SuperLearner(y, data.frame(X), family=binomial_extra(), SL.library = SL.library, cvControl = list(V = 2))