Project: Predicting heart diseases
In this project, we are going to use KNN model to predict the heart disease. We are going to use this dataset from Kaggle.
Highlights:
- Using StandardScaler to scale the features
- Defining Pipelines
- Using KNN model for predictions
#Imports
import numpy as np
import pandas as pd
#Load data
df = pd.read_csv('heart.csv')
#Glimpse of data
df.head(3)
age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 63 | 1 | 3 | 145 | 233 | 1 | 0 | 150 | 0 | 2.3 | 0 | 0 | 1 | 1 |
1 | 37 | 1 | 2 | 130 | 250 | 0 | 1 | 187 | 0 | 3.5 | 0 | 0 | 2 | 1 |
2 | 41 | 0 | 1 | 130 | 204 | 0 | 0 | 172 | 0 | 1.4 | 2 | 0 | 2 | 1 |
#Dropping the rows where target has null value
df.dropna(axis=0,subset=['target'],inplace=True)
#Separate predictors and target
X = df.drop('target',axis=1)
y = df['target']
#check for null values
[col for col in df.columns if df[col].isnull().any()]
[]
#Imports
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
#Define pipeline
my_pipline = Pipeline(steps=[
('scaler',StandardScaler()),
('model',KNeighborsClassifier())
])
#Calculate cross validation scores
scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')
scores.mean()
0.8150819672131148
We were able to achieve 81% accuracy with no parameter tuning. Let’s try to tune the parameter n_neighbors
and see what results can we achieve.
mean_scores = {}
for i in range(1,50):
my_pipline = Pipeline(steps=[
('scaler',StandardScaler()),
('model',KNeighborsClassifier(n_neighbors=i))
])
scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')
mean_scores[i] = scores.mean()
# Finding the key with the best value
max(mean_scores,key=lambda x:mean_scores[x])
28
#Replugging that value into the model and re-calculating the cross-validation scores
my_pipline = Pipeline(steps=[
('scaler',StandardScaler()),
('model',KNeighborsClassifier(n_neighbors=28))
])
scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')
scores.mean()
0.8348633879781421
We were able to achieve the accuracy of 83.5% using parameter tuning.
Leave a comment