Code Like A Girl

Welcome to Code Like A Girl, a space that celebrates redefining society's perceptions of women in technology. Share your story with us!

Follow publication

Member-only story

How to implement and select the best Linear Regression Model

Carla Martins
Code Like A Girl
Published in
17 min readOct 20, 2021

--

Example with R and Python code

Contents

1. About Linear Regression
2. Implementation
3. R² and Adjusted-R²
4. Residual Standard Error
5. Bland-Altman Statistics and Plot
6. Akaike Information Criterion (AIC)
7. Bayesian Information Criterion (BIC)
8. Correlation Coefficient (CC)
9. Z-score
10. Model Choice

0 Loading Packages and Libraries

Before we begin, and assuming you already have the necessary packages and libraries installed, this is the code you will need to run in order to make sure all functions will work in the following sections:

R packages:

library(ggplot2)
library(hrbrthemes)
library(tidyverse)
library(blandr)
library(readr)

Python libraries:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from numpy import linspace
from sklearn.linear model import LinearRegression
from sklearn import datasets, linear model
from sklearn.metrics import *
import statistics
import seaborn as sns
import statsmodels.api as sm
from scipy
stats import gaussian kde

You can download the data sample used in this tutorial as well as R and Python code from GitHub.

1 About Linear Regression

Linear regression is a mathematical model in the form of line equation:

y = b + a1x1 + a2x2 + a3x3 + …

where y is the dependent variable, and x1; x2; x3 are the independent variables. As we know from pre-calculus, b is the intercept with y 􀀀axis and a1; a2; a3 are the values that will set the line slope. In practice, y is the variable we want to predict, x1; x2; x3… are the predictor variables, and b is the y value when all x or all a are equal to zero (however, this values does not always have a real meaning outside the mathematical expression). Linear regression models can be used in life sciences, for example to predict hip circumference (that will be the y), based on weight, height and waist circumference (that will be x1, x2 and x3). The values of a1, a2 and a3 will tell us how much the independent variables x1, x2 and x3 affect hip circumference. A scatter-plot with the 3 predictive variables (x-axis) and the predicted variable (y-axis) can be found in Figure 1:

--

--

Published in Code Like A Girl

Welcome to Code Like A Girl, a space that celebrates redefining society's perceptions of women in technology. Share your story with us!

Written by Carla Martins

Compulsive learner. Passionate about technology. Speaks C, R, Python, SQL, Haskell, Java and LaTeX. Interested in creating solutions.

Responses (5)

Write a response