Statmodels In Python Package, How Exactly Duplicated Features Are Handled?
Solution 1:
The short answer:
GLM is using the Moore-Penrose generalized inverse, pinv, in this case, which corresponds to a principal component regression where components with zero eigenvalues are dropped. zero eigenvalue is defined by the default threshold (rcond) in numpy.linalg.pinv.
statsmodels does not have a systematic policy towards collinearity. Some nonlinear optimization routines raise an exception when the matrix inverse fails. However, the linear regression models, OLS and WLS, use the generalized inverse by default, in which case we see the behavior as above.
The default optimization algorithm in GLM.fit
is iteratively reweighted least squares irls
which uses WLS and inherits the default behavior of WLS for singular design matrices.
The version in statsmodels master has also the option of using the standard scipy optimizers where the behavior with respect to singular or near singular design matrices will depend on the details of the optimization algorithm.
Post a Comment for "Statmodels In Python Package, How Exactly Duplicated Features Are Handled?"