Predict Diabetes With Machine Learning Algorithms

Predict Diabetes With Machine Learning Algorithms

In this blog, our objective is to predict based on diagnostic measurements whether a patient has diabetes or not.

Diabetes is a common chronic disease and it is a great threat to human health.

The characteristic of diabetes is that the blood glucose is higher than the normal level, which is caused by defective insulin secretion or its impaired biological effects, or both.

Diabetes can lead to chronic damage and dysfunction of various tissues, especially eyes, kidneys, heart, blood vessels and nerves. Diabetes can be divided into two categories, type 1 diabetes (T1D) and type 2 diabetes (T2D).

Patients with type 1 diabetes are normally younger, mostly less than 30 years old. The typical clinical symptoms are increased thirst and frequent urination, high blood glucose levels. This type of diabetes cannot be cured effectively with oral medications alone and the patients are required insulin therapy.

Type 2 diabetes occurs more commonly in middle-aged and elderly people, which is often associated with the occurrence of obesity, hypertension, dyslipidemia, arteriosclerosis, and other diseases.

Pima Indians Diabetes Data set information?

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. In this database there are nine columns:

1. Pregnancies: Number of times pregnant
2. Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. BloodPressure: Diastolic blood pressure (mm Hg)
4. SkinThickness: Triceps skin fold thickness (mm)
5. Insulin: 2-Hour serum insulin (mu U/ml)
6. BMI: Body mass index (weight in kg/(height in m)^2)
7. DiabetesPedigreeFunction: Diabetes pedigree function
8. Age: Age (years)
9. Outcome: Class variable (0 or 1)

This dataset is taken from UCI Machine Learning Repository. If you want to download click link below:

Download

Load the data

In [1]:
import pandas as pd
import numpy as np
In [2]:
df = pd.read_csv("input/pima-indians-diabetes.csv")
df.head()
Out[2]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1

Dataset details

Shape of dataframe

In [3]:
df.shape
Out[3]:
(768, 9)

View dataframe information

In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

View dataset column names

In [5]:
df.columns
Out[5]:
Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
      dtype='object')

Statistical summary of data

In [6]:
df.describe()
Out[6]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240885 0.348958
std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000 0.000000
75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000 1.000000
max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000 1.000000

Data pre-processing

Check null records

In [7]:
df.isnull().values.any()
Out[7]:
False
In [8]:
df.isnull().sum()
Out[8]:
Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64
In [9]:
df.isnull().head()
Out[9]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 False False False False False False False False False
1 False False False False False False False False False
2 False False False False False False False False False
3 False False False False False False False False False
4 False False False False False False False False False

Drop records if there is null values

In [10]:
df.dropna(axis = 0, inplace = True)
In [11]:
df.shape
Out[11]:
(768, 9)

Visualize data

In [12]:
import seaborn as sns
import matplotlib.pyplot as plt

View diabetic and non diabetic patient

In [13]:
df['Outcome'].value_counts()
Out[13]:
0    500
1    268
Name: Outcome, dtype: int64

As per our dataset, 560 patients having no diabetes and 268 patients have diabetes.

Plot diabetic patient distribution

In [14]:
plt.figure(figsize =(8, 6))
f = sns.countplot(x = 'Outcome', data = df)
f.set_title("Diabetic Patient Distribution")
f.set_xticklabels(['No', 'Yes'])
plt.xlabel("");

Correlation between all variables

Correlation is used to denote association between two quantitative variables means relationship between two variables is called correlation. The degree of association is measured by a correlation coefficient. A correlation coefficient matrix is a simple table to summarize the correlations between all variables. Correlation give us a basic understanding of the relationship among variables of the dataset.

In [15]:
df.corr()
Out[15]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
Pregnancies 1.000000 0.129459 0.141282 -0.081672 -0.073535 0.017683 -0.033523 0.544341 0.221898
Glucose 0.129459 1.000000 0.152590 0.057328 0.331357 0.221071 0.137337 0.263514 0.466581
BloodPressure 0.141282 0.152590 1.000000 0.207371 0.088933 0.281805 0.041265 0.239528 0.065068
SkinThickness -0.081672 0.057328 0.207371 1.000000 0.436783 0.392573 0.183928 -0.113970 0.074752
Insulin -0.073535 0.331357 0.088933 0.436783 1.000000 0.197859 0.185071 -0.042163 0.130548
BMI 0.017683 0.221071 0.281805 0.392573 0.197859 1.000000 0.140647 0.036242 0.292695
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928 0.185071 0.140647 1.000000 0.033561 0.173844
Age 0.544341 0.263514 0.239528 -0.113970 -0.042163 0.036242 0.033561 1.000000 0.238356
Outcome 0.221898 0.466581 0.065068 0.074752 0.130548 0.292695 0.173844 0.238356 1.000000

Correlation plot for all variables

In [16]:
plt.figure(figsize =(10, 6))
sns.heatmap(df.corr(),annot=True)
Out[16]:
<AxesSubplot:>

Distribution of glucose of diabetic patients

When value of Glucose is higher than 110, patients are more like to be diabetic. Same when value of Insulin is higher than 150, patients are more like to be diabetic. Let us plot.

In [17]:
df['glucose_category'] = pd.cut(df['Glucose'], bins=list(np.arange(45, 200, 65)))
df['glucose_category']
Out[17]:
0      (110.0, 175.0]
1       (45.0, 110.0]
2                 NaN
3       (45.0, 110.0]
4      (110.0, 175.0]
            ...      
763     (45.0, 110.0]
764    (110.0, 175.0]
765    (110.0, 175.0]
766    (110.0, 175.0]
767     (45.0, 110.0]
Name: glucose_category, Length: 768, dtype: category
Categories (2, interval[int64]): [(45, 110] < (110, 175]]
In [18]:
df.head()
Out[18]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome glucose_category
0 6 148 72 35 0 33.6 0.627 50 1 (110.0, 175.0]
1 1 85 66 29 0 26.6 0.351 31 0 (45.0, 110.0]
2 8 183 64 0 0 23.3 0.672 32 1 NaN
3 1 89 66 23 94 28.1 0.167 21 0 (45.0, 110.0]
4 0 137 40 35 168 43.1 2.288 33 1 (110.0, 175.0]
In [19]:
count_of_positive_diabetes_diagnosed = df[df['Outcome'] == 1].groupby('glucose_category')['Glucose'].count()
count_of_positive_diabetes_diagnosed
Out[19]:
glucose_category
(45, 110]      42
(110, 175]    178
Name: Glucose, dtype: int64
In [20]:
count_of_positive_diabetes_diagnosed.plot(kind='bar')
plt.title('Glucose Distribution of Diabetic Patients')
Out[20]:
Text(0.5, 1.0, 'Glucose Distribution of Diabetic Patients')

Distribution of glucose by diabetes

In [21]:
fig = plt.figure(figsize=(10,6))
sns.distplot(df['Glucose'], kde=True)
plt.show()
/opt/tljh/user/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)

Age distribution of diabetes patients

In [22]:
df['age_category'] = pd.cut(df['Age'], bins=list(np.arange(20, 80, 10)))
df['age_category']
Out[22]:
0      (40, 50]
1      (30, 40]
2      (30, 40]
3      (20, 30]
4      (30, 40]
         ...   
763    (60, 70]
764    (20, 30]
765    (20, 30]
766    (40, 50]
767    (20, 30]
Name: age_category, Length: 768, dtype: category
Categories (5, interval[int64]): [(20, 30] < (30, 40] < (40, 50] < (50, 60] < (60, 70]]
In [23]:
count_of_positive_diabetes_diagnosed_by_age = df[df['Outcome'] == 1].groupby('age_category')['Age'].count()
count_of_positive_diabetes_diagnosed_by_age
Out[23]:
age_category
(20, 30]    90
(30, 40]    76
(40, 50]    64
(50, 60]    31
(60, 70]     7
Name: Age, dtype: int64
In [24]:
count_of_positive_diabetes_diagnosed_by_age.plot(kind='bar')
plt.title('Age Distribution of Diabetic Patients')
Out[24]:
Text(0.5, 1.0, 'Age Distribution of Diabetic Patients')

Distribution of Age

In [25]:
fig = plt.figure(figsize=(10,6))
sns.distplot(df['Age'], kde = True)
plt.show()
/opt/tljh/user/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)

Distribution of BMI by Diabetes

In [26]:
fig = plt.figure(figsize=(10,6))
sns.distplot(df['BMI'], kde = True)
plt.show()
/opt/tljh/user/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)

Plot Diabetes Patients

In [27]:
df[df['Outcome'] == 1].hist(figsize = (20,20))
plt.title('Diabetes Patients')
Out[27]:
Text(0.5, 1.0, 'Diabetes Patients')

Plot Non Diabetes Patients

In [28]:
df[df['Outcome'] == 0].hist(figsize = (20,20))
plt.title('Non Diabetes Patients')
Out[28]:
Text(0.5, 1.0, 'Non Diabetes Patients')

For the distribution of Non-diabetes people, the peak at 25 is very sharp and decrease very fast.

Create feature and target columns

In [29]:
x = df.iloc[:,0:8]
y = df.iloc[:,8]
In [30]:
x.head()
Out[30]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age
0 6 148 72 35 0 33.6 0.627 50
1 1 85 66 29 0 26.6 0.351 31
2 8 183 64 0 0 23.3 0.672 32
3 1 89 66 23 94 28.1 0.167 21
4 0 137 40 35 168 43.1 2.288 33
In [31]:
y.head()
Out[31]:
0    1
1    0
2    1
3    0
4    1
Name: Outcome, dtype: int64

Split data into train and test

In [32]:
from sklearn.model_selection import train_test_split
In [33]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=42)
In [34]:
x_train.shape
Out[34]:
(614, 8)
In [35]:
y_train.shape
Out[35]:
(614,)
In [36]:
x_test.shape
Out[36]:
(154, 8)
In [37]:
y_test.shape
Out[37]:
(154,)

Rescale training and test data

In [38]:
from sklearn.preprocessing import StandardScaler
In [39]:
ss = StandardScaler()

x_train = ss.fit_transform(x_train)
x_test = ss.fit_transform(x_test)

Logistic Regression

In [40]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

Train model

In [41]:
lr.fit(x_train, y_train)
Out[41]:
LogisticRegression()

Predict test data

In [42]:
predictions = lr.predict(x_test)

View predicted and actual value

In [43]:
print("Predicted value: ", predictions)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1
 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 1 1 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0
 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64

View the accuracy

In [44]:
from sklearn.metrics import accuracy_score

print('Accuracy: ', accuracy_score(y_test, predictions))
Accuracy:  0.7857142857142857

Random Forest Classifier

In [45]:
from sklearn.ensemble import RandomForestClassifier

Create RandomForestClassifier model

In [46]:
rfc = RandomForestClassifier()

Train model

In [47]:
rfc.fit(x_train, y_train)
Out[47]:
RandomForestClassifier()

Predict test data

In [48]:
rfcpredictions = rfc.predict(x_test)
In [49]:
print("Predicted value: ", rfcpredictions)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1
 1 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0
 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0
 0 1 0 0 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64

View accuracy

In [50]:
print('Accuracy: ', accuracy_score(y_test, rfcpredictions))
Accuracy:  0.7792207792207793

SVC (support Vector Classifier)

In [51]:
from sklearn.svm import SVC

Create model

In [52]:
svc = SVC()

Train model

In [53]:
svc.fit(x_train, y_train)
Out[53]:
SVC()
In [54]:
svcpredictions = svc.predict(x_test)
In [55]:
print("Predicted value: ", svcpredictions)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 0
 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0
 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64
In [56]:
print('Accuracy: ', accuracy_score(y_test, svcpredictions))
Accuracy:  0.7402597402597403

KNeighborsClassifier

In [57]:
from sklearn.neighbors import KNeighborsClassifier

Create a model

In [58]:
kn = KNeighborsClassifier()

Train the model

In [59]:
kn.fit(x_train, y_train)
Out[59]:
KNeighborsClassifier()

Predict the test data

In [60]:
knprediction = kn.predict(x_test)
In [61]:
print("Predicted value: ", knprediction)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1
 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0
 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64
In [62]:
print('Accuracy: ', accuracy_score(y_test, knprediction))
Accuracy:  0.6948051948051948

Decision Tree Classifier

In [63]:
from sklearn.tree import DecisionTreeClassifier
In [64]:
dtc = DecisionTreeClassifier(random_state=0)
In [65]:
dtc.fit(x_train, y_train)
Out[65]:
DecisionTreeClassifier(random_state=0)
In [66]:
dtcprediction = dtc.predict(x_test)
In [67]:
print("Predicted value: ", dtcprediction)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1
 1 0 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0
 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0
 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 1 0 1 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64
In [68]:
print('Accuracy: ', accuracy_score(y_test, dtcprediction))
Accuracy:  0.7402597402597403

GradientBoostingClassifier

In [69]:
from sklearn.ensemble import GradientBoostingClassifier
In [70]:
gbc = GradientBoostingClassifier()
In [71]:
gbc.fit(x_train, y_train)
Out[71]:
GradientBoostingClassifier()
In [72]:
gbcprediction = gbc.predict(x_test)
In [73]:
print("Predicted value: ", gbcprediction)
print("Actual value: ", y_test)
Predicted value:  [0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1
 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0
 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0
 1 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0
 0 1 0 0 0 0]
Actual value:  668    0
324    0
624    0
690    0
473    0
      ..
355    1
534    0
344    0
296    1
462    0
Name: Outcome, Length: 154, dtype: int64
In [74]:
print('Accuracy: ', accuracy_score(y_test, gbcprediction))
Accuracy:  0.7337662337662337

Model performance summary

Accuracy

In [78]:
print('Logistic Regression: ', accuracy_score(y_test, predictions))
print('Random Forest Classifier: ', accuracy_score(y_test, rfcpredictions))
print('Support Vector Classifier: ', accuracy_score(y_test, svcpredictions))
print('KNeighbors Classifier: ', accuracy_score(y_test, knprediction))
print('Decision Tree Classifier: ', accuracy_score(y_test, dtcprediction))
print('Gradient Boosting Classifier: ', accuracy_score(y_test, gbcprediction))
Logistic Regression:  0.7857142857142857
Random Forest Classifier:  0.7792207792207793
Support Vector Classifier:  0.7402597402597403
KNeighbors Classifier:  0.6948051948051948
Decision Tree Classifier:  0.7402597402597403
Gradient Boosting Classifier:  0.7337662337662337

We have tried 6 different models. The performance scores of each model are list above. According to the prediction, the Logistic Regression model has the highest accuracy score of 78%.

In [ ]:
 
In [ ]:
 
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing