Linear Regression

Linear regression is the simplest way of determining the linear relationship between a dependent variable and one or more independent explanatory variables. The linear regression model is as follows: Here, is the mean (or intercept) effect and is the effect of the explanatory variable X on the response variable Y. For example, Y could be a vector of the heights of a set of N subjects, X could contain two potentially explanatory variables such as age and ethnicity, and solving for β would give us a 2 x 1 vector (one coefficient for each explanatory variable) describing how age and ethnicity contribute to the heights of subjects in general.

Let's see how we would implement linear regression in Python using scikit-learn.

from sklearn import datasets
from sklearn import linear_model
import numpy as np

boston = datasets.load_boston()  # load the Boston housing dataset
X = boston.data
Y = boston.target
model = linear_model.LinearRegression(fit_intercept=True)  # use fit_intercept=True to include the mean effect term
modelfit = model.fit(X,Y)  # fit the model to the data
beta = modelfit.coef_  # this gives us the effects of the explanatory variables

# assume we have some X_test for which we'd like to find the corresponding Y_test
modelfit.predict(X_test)

results matching ""

    No results matching ""