# LASSO and RIDGE Linear regressions & L1 and L2 regularizations - the impact of the penalty term
[](https://hub.ufzg.hr/uploads/images/gallery/2024-09/logo-stoo2-1-no.png) | **Teaching (Today for) Tomorrow:** **Bridging the Gap between the Classroom and Reality** 3rd International Scientific and Art Conference Faculty of Teacher Education, University of Zagreb in cooperation with the Croatian Academy of Sciences and Arts |
##### **Siniša Opić** *Faculty of Education, University of Zagreb, Croatia* *sinisa.opic@ufzg.hr* |
**Section - Education for innovation and research** | **Paper number: 29** | **Category: Original scientific paper** |
##### **Abstract** |
Lasso and Ridge regressions are useful enhancements to ordinary least squares (OLS) regression, utilizing L1 and L2 regularization techniques. The main idea behind these methods is to balance the simplicity of the model with its accuracy by adding a penalty term to the existing regression formula. This approach leads to a significant reduction in the regression coefficients, potentially reducing some to zero (in the case of L1 regularization). These enhancements improve accuracy, increase R², lower mean squared error (MSE), eliminate non-significant coefficients, address the issue of multicollinearity among predictors, and most importantly, help prevent overfitting. In the empirical section of the study, both L1 (LASSO) and L2 (RIDGE) regression analyses were conducted on a dataset comprising 156 samples. A cross-validation procedure was employed with λ values ranging from 1 to 20 (with λmax=20, λmin = 0.5, and λ intervals of 1), using 5 folds. The results indicated that the application of Ridge regression, when compared to OLS, led to a reduction in the Beta coefficient, an increase in R², and a decrease in the Mean Squared Error (MSE). Furthermore, no significant discrepancies were found between the R² values for the training and holdout datasets, suggesting that the model is robust. However, in comparison, L1 regularization (LASSO) was performed, and the values of all betas for all categories of predictor variables are zero (βstd=0; B=0); β=0 which indicates the impossibility of interpreting the regression model (type 2 error). LASSO and RIDGE regressions are iterative procedures, and their results depend on randomization, the number of k-fold cross-validations and k-iterations, the percentage of training and testing data, and the value of lambda (although this is a minor problem), so you should be careful when using them. |
***Key words:*** |
Lasso regression, L1, L2 regularization, MSE, overfitting, penalty term, Ridge regression, The sum of squared residuals |
**Table 1 - RIDGE Regression Coefficientsa** | |||||
Alpha | Standardizing Valuesc | Standardized Coefficients | Unstandardized Coefficients | ||
Mean | Std. Dev. | ||||
20,000 | Interceptb | . | . | 3,892 | 3,846 |
\[V4=1\] | ,351 | ,477 | ,098 | ,205 | |
\[V4=2\] | ,297 | ,457 | ,064 | ,139 | |
\[V4=3\] | ,153 | ,360 | -,042 | -,118 | |
\[V4=4\] | ,072 | ,259 | ,017 | ,066 | |
\[V4=5\] | ,126 | ,332 | **-,195** | **-,589** | |
Father's education =1\] | ,018 | ,133 | -,038 | -,286 | |
Father's education =2\] | ,117 | ,322 | -,096 | -,298 | |
Father's education =3\] | ,126 | ,332 | -,015 | -,046 | |
Father's education =4\] | ,387 | ,487 | ,041 | ,085 | |
Father's education =5\] | ,072 | ,259 | -,053 | -,206 | |
Father's education =6\] | ,135 | ,342 | ,102 | ,299 | |
Father's education =7\] | ,018 | ,133 | ,034 | ,254 | |
Father's education =8\] | ,027 | ,162 | ,023 | ,141 | |
Father's education =9\] | ,099 | ,299 | -,028 | -,094 | |
\[school=1\] | ,523 | ,499 | ,091 | ,182 | |
\[school=2\] | ,477 | ,499 | -,091 | -,182 | |
a. Dependent Variable: school success | |||||
b. The intercept is not penalized during estimation. | |||||
c. Values used to standardize predictors for estimation. The dependent variable is not standardized. |
**Table 2 - Model Comparisonsa,b,c** | |||
Alpha/ Lambda | Average Test Subset R Square | Average Test Subset MSE | |
20,000 | ,066 | ,555 | |
19,500 | ,066 | ,555 | |
18,500 | ,065 | ,556 | |
17,500 | ,063 | ,557 | |
16,500 | ,062 | ,557 | |
15,500 | ,061 | ,558 | |
14,500 | ,059 | ,559 | |
13,500 | ,058 | ,560 | |
12,500 | ,056 | ,561 | |
11,500 | ,055 | ,562 | |
10,500 | ,053 | ,563 | |
9,500 | ,051 | ,564 | |
8,500 | ,049 | ,565 | |
7,500 | ,047 | ,566 | |
6,500 | ,045 | ,568 | |
5,500 | ,043 | ,569 | |
4,500 | ,041 | ,570 | |
3,500 | ,038 | ,572 | |
2,500 | ,036 | ,573 | |
1,500 | ,033 | ,575 | |
,500 | ,030 | ,577 | |
a. Dependent Variable: school success | |||
b. Model: V4, Father´s education, school c. Number of cross-validation folds: 5 |