Logistic Regression

Logistic regression, in spite of its name, is a classification method used when the response variable Y is categorical (in other words, the numerical value of the variable describes group membership rather than a value on a scale). Binary classification, where the response variable takes only a value of 0 or 1 -- e.g. is this subject at risk for breast cancer or not? -- is a classic example of such a case. Note that this restriction on the response variable does not limit our explanatory variable to be either discrete or continuous. The fundamental model is same as in the linear regression case:

$\begin{aligned}Y=X\cdot\beta + \mu\end{aligned}$

How do we model the response such that our predictions are binary?

It turns out that we can do this by transforming the output using a logistic function, and treating the model as predicting P(Y=1|X,β). This is given by:

$\begin{aligned}P(Y=1|X,\beta)=g(\mu+\beta^TX)\end{aligned}$

Here the function g(z)is the logistic function, given by:

$\begin{aligned}g(z)=\frac{1}{1 + exp(-z)}\end{aligned}$

We can implement logistic regression in Python without having to worry too much about the details of the model. Again, we use scikit-learn. Example code is shown below. The file binary.csv is available on piazza under Resources.

from sklearn import linear_model
import numpy as np

'''Read first line of the file (column names)'''
col_names = []
f = open('data_files/binary.csv')
col_names = f.readline().strip().split(',')
f.close()

'''Read the data file using genfromtxt function in NumPy. Skip reading
the first line by skip_header, since we already have it in the list 
col_names.'''
dat_ex = np.genfromtxt('data_files/binary.csv',delimiter=',',skip_header=1)
dat_ex.shape  # dimensions of the matrix (m, n)


'''
The first column of this file is 'admit' which is a binary variable
where 0 represents that the student was not admitted to the program 
and 1 represents students who were offered admission. Here the first
column is the response variable Y. We use 300 samples for training
and 100 for test. '''

Y = dat_ex[0:300,0]  # selecting all rows of column 0. In python indexing begins from 0
X = dat_ex[0:300,1:]
model = linear_model.LogisticRegression(C=1e86)  # make inverse penalty very large so that penalty is effectively 0
fitted_model = model.fit(X,Y)
beta = fitted_model.coef_
intercept = fitted_model.intercept_
testX = dat_ex[300:,1:]
testY = dat_ex[300:,0]

predicted_Y = fitted_model.predict(testX)  # predict the Y for testX using the fitted model

Logistic Regression

Logistic Regression

results matching ""

No results matching ""