Support Vector Machines

Support vector machines present another way of doing classification (similar to logistic regression). However, support vector machines are non-probabilistic. In the classic case, a support vector machine takes two labeled groups of samples in some n-dimensional linear space and finds the linear decision function which best separates the two groups. It then uses this line to label test samples (simply based on which side of the boundary they fall on in the linear space).

In more advanced formulations, we may use the kernel trick to find non-linear functions that separate groups and may perform multi-label classfication as well (rather than binary classification alone).

In general, solving the SVM to find the best separating function requires doing some optimization (we will cover this in class). However, we can implement the SVM in Python using scikit-learn, again without knowing too many details of the math involved.

from sklearn import svm
import numpy as np

'''Read first line of the file (column names)'''
col_names = []
f = open('data_files/binary.csv')
col_names = f.readline().strip().split(',')
f.close()

'''Read the data file using genfromtxt function in NumPy. Skip reading the first line by skip_header, since we already have it in the list col_names'''
dat_ex = np.genfromtxt('data_files/binary.csv',delimiter=',',skip_header=1)
dat_ex.shape  # dimensions of the matrix (m, n)


'''
The first column of this file is 'admit' which is a binary variable
where 0 represents that the student was not admitted to the program
and 1 represents students who were offered admission. Here the first
column is the response variable Y. We use 300 samples for training
and 100 for test. '''

Y = dat_ex[0:300,0]  # selecting all rows of column 0. In python indexing begins from 0
X = dat_ex[0:300,1:]
model = svm.SVC() # default kernel is RBF
fitted_model = model.fit(X,Y)

testX = dat_ex[300:,1:]
testY = dat_ex[300:,0]

predicted_Y = fitted_model.predict(testX)  # predict the Y for testX using the fitted model

results matching ""

    No results matching ""