Project: Predicting heart diseases

5 minute read

In this project, we are going to use KNN model to predict the heart disease. We are going to use this dataset from Kaggle.

Highlights:

Using StandardScaler to scale the features
Defining Pipelines
Using KNN model for predictions

#Imports
import numpy as np
import pandas as pd

#Load data
df = pd.read_csv('heart.csv')

#Glimpse of data
df.head(3)

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	oldpeak	slope	thal	target
0	63	1	3	145	233	1	0	150	2.3	0	1	1
1	37	1	2	130	250	0	1	187	3.5	0	2	1
2	41	0	1	130	204	0	0	172	1.4	2	2	1

#Dropping the rows where target has null value
df.dropna(axis=0,subset=['target'],inplace=True)

#Separate predictors and target
X = df.drop('target',axis=1)
y = df['target']

#check for null values
[col for col in df.columns if df[col].isnull().any()]

    []

#Imports
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline

#Define pipeline
my_pipline = Pipeline(steps=[
    ('scaler',StandardScaler()),
    ('model',KNeighborsClassifier())
])

#Calculate cross validation scores
scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')

scores.mean()

    0.8150819672131148

We were able to achieve 81% accuracy with no parameter tuning. Let’s try to tune the parameter n_neighbors and see what results can we achieve.

mean_scores = {}

for i in range(1,50):
    my_pipline = Pipeline(steps=[
        ('scaler',StandardScaler()),
        ('model',KNeighborsClassifier(n_neighbors=i))
    ])
    scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')
    mean_scores[i] = scores.mean()

# Finding the key with the best value
max(mean_scores,key=lambda x:mean_scores[x])

    28

#Replugging that value into the model and re-calculating the cross-validation scores
my_pipline = Pipeline(steps=[
    ('scaler',StandardScaler()),
    ('model',KNeighborsClassifier(n_neighbors=28))
])

scores = cross_val_score(my_pipline,X,y,cv=5,scoring='accuracy')

scores.mean()

    0.8348633879781421

We were able to achieve the accuracy of 83.5% using parameter tuning.

Share on

Twitter Facebook LinkedIn

Muzammil Iftikhar

Project: Predicting heart diseases

Share on

Leave a comment

You may also enjoy

Flask+Pipenv+Postgres+Docker+Nginx+uWSGI

Project: Predicting breast cancer

Webscraping sites with infinite scroll

Webscraping using scrapy

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	oldpeak	slope	thal	target
0	63	1	3	145	233	1	0	150	2.3	0	1	1
1	37	1	2	130	250	0	1	187	3.5	0	2	1
2	41	0	1	130	204	0	0	172	1.4	2	2	1

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	oldpeak	slope	thal	target
0	63	1	3	145	233	1	0	150	2.3	0	1	1
1	37	1	2	130	250	0	1	187	3.5	0	2	1
2	41	0	1	130	204	0	0	172	1.4	2	2	1

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	oldpeak	slope	thal	target
0	63	1	3	145	233	1	0	150	2.3	0	1	1
1	37	1	2	130	250	0	1	187	3.5	0	2	1
2	41	0	1	130	204	0	0	172	1.4	2	2	1