Predict Amazon Inc Stock Price with Machine Learning

Predict Amazon Inc Stock Price with Machine Learning

In this article we are going to see how we can predict Amazon stock price with the help of Machine Learning.

Import library

In [1]:
import pandas as pd

Load data

I have used the 5 years historical data of Amazon.com, Inc. (AMZN). You can download data from following link: Amazon.com, Inc. (AMZN)

In [2]:
inputFolder = "input/"
In [3]:
filePath = inputFolder + "AMZN.csv"
filePath
Out[3]:
'input/AMZN.csv'

Read csv file using pandas library

pandas.read_csv(): Read a comma-separated values (csv) file into DataFrame.

In [4]:
df = pd.read_csv(filePath)
df
Out[4]:
Date Open High Low Close Adj Close Volume
0 2016-04-14 615.070007 624.380005 615.070007 620.750000 620.750000 3512100
1 2016-04-15 621.919983 626.770020 618.109985 625.890015 625.890015 2887700
2 2016-04-18 625.349976 637.640015 624.960022 635.349976 635.349976 4360900
3 2016-04-19 637.140015 638.010010 620.799988 627.900024 627.900024 4055900
4 2016-04-20 630.000000 636.549988 623.000000 632.989990 632.989990 2609400
... ... ... ... ... ... ... ...
1253 2021-04-07 3233.800049 3303.610107 3223.649902 3279.389893 3279.389893 3346200
1254 2021-04-08 3310.899902 3324.500000 3292.000000 3299.300049 3299.300049 2812100
1255 2021-04-09 3304.699951 3372.199951 3288.899902 3372.199951 3372.199951 4334600
1256 2021-04-12 3355.209961 3395.040039 3351.149902 3379.389893 3379.389893 3281800
1257 2021-04-13 3400.850098 3432.000000 3395.629883 3400.000000 3400.000000 3304900

1258 rows × 7 columns

View dataframe shape

pandas.DataFrame.shape:

Return a tuple representing the dimensionality of the DataFrame.
In [5]:
df.shape
Out[5]:
(1258, 7)

Data has 1258 rows and 7 columns.

DataFrame.head(n=5):
Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. Default is 5 number of rows to select.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].
In [6]:
df.head()
Out[6]:
Date Open High Low Close Adj Close Volume
0 2016-04-14 615.070007 624.380005 615.070007 620.750000 620.750000 3512100
1 2016-04-15 621.919983 626.770020 618.109985 625.890015 625.890015 2887700
2 2016-04-18 625.349976 637.640015 624.960022 635.349976 635.349976 4360900
3 2016-04-19 637.140015 638.010010 620.799988 627.900024 627.900024 4055900
4 2016-04-20 630.000000 636.549988 623.000000 632.989990 632.989990 2609400

DataFrame.tail(n=5)

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows. Default is 5 Number of rows to select.


For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].
In [7]:
df.tail()
Out[7]:
Date Open High Low Close Adj Close Volume
1253 2021-04-07 3233.800049 3303.610107 3223.649902 3279.389893 3279.389893 3346200
1254 2021-04-08 3310.899902 3324.500000 3292.000000 3299.300049 3299.300049 2812100
1255 2021-04-09 3304.699951 3372.199951 3288.899902 3372.199951 3372.199951 4334600
1256 2021-04-12 3355.209961 3395.040039 3351.149902 3379.389893 3379.389893 3281800
1257 2021-04-13 3400.850098 3432.000000 3395.629883 3400.000000 3400.000000 3304900

Create a new dataframe

Create a new dataframe with two columns 'Date' and 'Close'. For stock prediction we need only date and closing price. We are using length of original dataframe as index.

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.


Parameters
data: ndarray, Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.

index: Index or array-like
    Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

columns: Index or array-like
    Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dtype: dtype, default None
    Data type to force. Only a single dtype is allowed. If None, infer.

copy: bool, default False
    Copy data from inputs. Only affects DataFrame / 2d ndarray input.
In [8]:
new_df = pd.DataFrame(index = range(0,len(df)), columns=['Date', 'Close'])
new_df
Out[8]:
Date Close
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
1253 NaN NaN
1254 NaN NaN
1255 NaN NaN
1256 NaN NaN
1257 NaN NaN

1258 rows × 2 columns

Sort the dataframe

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.
In [9]:
df = df.sort_index(ascending = True, axis = 0)
df
Out[9]:
Date Open High Low Close Adj Close Volume
0 2016-04-14 615.070007 624.380005 615.070007 620.750000 620.750000 3512100
1 2016-04-15 621.919983 626.770020 618.109985 625.890015 625.890015 2887700
2 2016-04-18 625.349976 637.640015 624.960022 635.349976 635.349976 4360900
3 2016-04-19 637.140015 638.010010 620.799988 627.900024 627.900024 4055900
4 2016-04-20 630.000000 636.549988 623.000000 632.989990 632.989990 2609400
... ... ... ... ... ... ... ...
1253 2021-04-07 3233.800049 3303.610107 3223.649902 3279.389893 3279.389893 3346200
1254 2021-04-08 3310.899902 3324.500000 3292.000000 3299.300049 3299.300049 2812100
1255 2021-04-09 3304.699951 3372.199951 3288.899902 3372.199951 3372.199951 4334600
1256 2021-04-12 3355.209961 3395.040039 3351.149902 3379.389893 3379.389893 3281800
1257 2021-04-13 3400.850098 3432.000000 3395.629883 3400.000000 3400.000000 3304900

1258 rows × 7 columns

Fill data in new dataframe

We have to take data from original dataframe(df) and fill in new dataframe(new_df).

In [10]:
for i in range(0, len(df)):
    new_df['Date'][i] = df['Date'][i]
    new_df['Close'][i] = df['Close'][i]
In [11]:
new_df
Out[11]:
Date Close
0 2016-04-14 620.75
1 2016-04-15 625.89
2 2016-04-18 635.35
3 2016-04-19 627.9
4 2016-04-20 632.99
... ... ...
1253 2021-04-07 3279.39
1254 2021-04-08 3299.3
1255 2021-04-09 3372.2
1256 2021-04-12 3379.39
1257 2021-04-13 3400

1258 rows × 2 columns

Set date as index

In [12]:
new_df.index = new_df.Date
new_df
Out[12]:
Date Close
Date
2016-04-14 2016-04-14 620.75
2016-04-15 2016-04-15 625.89
2016-04-18 2016-04-18 635.35
2016-04-19 2016-04-19 627.9
2016-04-20 2016-04-20 632.99
... ... ...
2021-04-07 2021-04-07 3279.39
2021-04-08 2021-04-08 3299.3
2021-04-09 2021-04-09 3372.2
2021-04-12 2021-04-12 3379.39
2021-04-13 2021-04-13 3400

1258 rows × 2 columns

Drop Date column

Now we don't need 'Date' column, so just drop the column.

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. 

Parameters
labels: single label or list-like
Index or column labels to drop.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘rows’) or columns (1 or ‘columns’).

index: single label or list-like
Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

columns: single label or list-like
Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

inplace: bool, default False
If False, return a copy. Otherwise, do operation inplace and return None.

Returns
    DataFrame or None. DataFrame without the removed index or column labels or None if inplace=True.
In [13]:
new_df.drop('Date', axis=1, inplace=True)
new_df
Out[13]:
Close
Date
2016-04-14 620.75
2016-04-15 625.89
2016-04-18 635.35
2016-04-19 627.9
2016-04-20 632.99
... ...
2021-04-07 3279.39
2021-04-08 3299.3
2021-04-09 3372.2
2021-04-12 3379.39
2021-04-13 3400

1258 rows × 1 columns

pandas.DataFrame.values

DataFrame.values: Return a Numpy representation of the DataFrame.

In [14]:
dataset = new_df.values
dataset[:10]
Out[14]:
array([[620.75],
       [625.8900150000001],
       [635.349976],
       [627.900024],
       [632.98999],
       [631.0],
       [620.5],
       [626.200012],
       [616.880005],
       [606.570007]], dtype=object)

Scaling features to a range

It is important to scale features before training a neural network. Normalization is a common way of doing this scaling.

A way to normalize the input features/variables is the Min-Max scaler. By doing so, all features will be transformed into the range [0,1] meaning that the minimum and maximum value of a feature/variable is going to be 0 and 1, respectively.

In [15]:
from sklearn.preprocessing import MinMaxScaler
class sklearn.preprocessing.MinMaxScaler(feature_range=0, 1, *, copy=True, clip=False)
Transform features by scaling each feature to a given range.

This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
X: array-like of shape (n_samples, n_features)

    Input samples.

y: array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

    Target values (None for unsupervised transformations).

**fit_paramsdict

    Additional fit parameters.

Returns
X_new: ndarray array of shape (n_samples, n_features_new)

Transformed array.
In [16]:
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
scaled_data[:10]
Out[16]:
array([[0.00640052],
       [0.00815512],
       [0.01138438],
       [0.00884126],
       [0.01057877],
       [0.00989947],
       [0.00631518],
       [0.00826094],
       [0.00507945],
       [0.00156002]])

Split data in train and test

We are dividing data for training and testing. We have 1258 records. We are taking index 0 to 700 for training and from 700 to last for validation.

In [17]:
train = dataset[0:700,:]
valid = dataset[700:,:]
In [18]:
train.shape
Out[18]:
(700, 1)
In [19]:
valid.shape
Out[19]:
(558, 1)
In [20]:
train[:5]
Out[20]:
array([[620.75],
       [625.8900150000001],
       [635.349976],
       [627.900024],
       [632.98999]], dtype=object)
In [21]:
valid[:5]
Out[21]:
array([[1670.569946],
       [1637.890015],
       [1593.880005],
       [1670.430054],
       [1718.72998]], dtype=object)

Converting dataset into X_train and y_train

In [22]:
len(train)
Out[22]:
700
In [23]:
X_train, y_train = [], []

for i in range(60, len(train)):
    X_train.append(scaled_data[i-60: i, 0])
    y_train.append(scaled_data[i,0])
    
    pass
In [24]:
X_train[0]
Out[24]:
array([0.00640052, 0.00815512, 0.01138438, 0.00884126, 0.01057877,
       0.00989947, 0.00631518, 0.00826094, 0.00507945, 0.00156002,
       0.        , 0.01965899, 0.02794039, 0.02366315, 0.02351978,
       0.01948831, 0.02456093, 0.02654082, 0.03450136, 0.03796958,
       0.03957398, 0.03683967, 0.03709228, 0.03183875, 0.03258291,
       0.03294817, 0.03440919, 0.03234396, 0.0348871 , 0.03630374,
       0.03854306, 0.03763163, 0.04123299, 0.04008944, 0.04309341,
       0.04217173, 0.04257795, 0.04155729, 0.04254724, 0.04289202,
       0.03956715, 0.03865572, 0.04004164, 0.03832119, 0.03943061,
       0.03563468, 0.03823585, 0.03885371, 0.0370718 , 0.04099064,
       0.03309837, 0.03050401, 0.0361672 , 0.0387786 , 0.03878544,
       0.04221953, 0.04304562, 0.04629196, 0.04593695, 0.04909113])

Convert X_train and y_train into numpy array

In [25]:
import numpy as np
In [26]:
X_train, y_train = np.array(X_train), np.array(y_train)

print(X_train[1])
print(y_train[1])
[0.00815512 0.01138438 0.00884126 0.01057877 0.00989947 0.00631518
 0.00826094 0.00507945 0.00156002 0.         0.01965899 0.02794039
 0.02366315 0.02351978 0.01948831 0.02456093 0.02654082 0.03450136
 0.03796958 0.03957398 0.03683967 0.03709228 0.03183875 0.03258291
 0.03294817 0.03440919 0.03234396 0.0348871  0.03630374 0.03854306
 0.03763163 0.04123299 0.04008944 0.04309341 0.04217173 0.04257795
 0.04155729 0.04254724 0.04289202 0.03956715 0.03865572 0.04004164
 0.03832119 0.03943061 0.03563468 0.03823585 0.03885371 0.0370718
 0.04099064 0.03309837 0.03050401 0.0361672  0.0387786  0.03878544
 0.04221953 0.04304562 0.04629196 0.04593695 0.04909113 0.05181178]
0.049910401080615674
In [27]:
X_train.shape[0]
Out[27]:
640
In [28]:
X_train.shape[1]
Out[28]:
60
In [29]:
X_train.shape
Out[29]:
(640, 60)

Reshape X_train array

NumPy array reshape this one-dimensional array into a three-dimensional array with 640 sample, 60 time steps, and 1 feature at each time step.

numpy.reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
In [30]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
In [31]:
X_train.shape
Out[31]:
(640, 60, 1)

Now X_train data is ready to be used as input (X) to the LSTM with an input_shape of (60, 1).

Create model

In [32]:
import tensorflow as tf
In [33]:
model = tf.keras.Sequential()

Add layers in model

The input to every LSTM layer must be three-dimensional.

The three dimensions of this input are:

Samples. One sequence is one sample. A batch is comprised of one or more samples.
Time Steps. One time step is one point of observation in the sample.
Features. One feature is one observation at a time step.

This means that the input layer expects a 3D array of data when fitting the model and when making predictions, even if specific dimensions of the array contain a single value, e.g. one sample or one feature.

Units: The amount of "neurons", or "cells", or whatever the layer has inside it.

The LSTM input layer is defined by the input_shape argument on the first hidden layer. The input_shape argument takes a tuple of two values that define the number of time steps and features.

Hidden layer 1: 50 units/ 50 neurons
Hidden layer 2: 50 units/ 50 neurons
Last layer: 1 unit
In [34]:
model.add(tf.keras.layers.LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(tf.keras.layers.LSTM(units = 50))
model.add(tf.keras.layers.Dense(1))

Model summary

In [35]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 60, 50)            10400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 50)                20200     
_________________________________________________________________
dense (Dense)                (None, 1)                 51        
=================================================================
Total params: 30,651
Trainable params: 30,651
Non-trainable params: 0
_________________________________________________________________

Compile model

The mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

In [36]:
model.compile(loss = 'mean_squared_error', optimizer = 'adam')

Train the model

In [37]:
history = model.fit(X_train, y_train, epochs = 100, batch_size=10)
Epoch 1/100
64/64 [==============================] - 1s 14ms/step - loss: 0.0059
Epoch 2/100
64/64 [==============================] - 1s 14ms/step - loss: 3.9926e-04
Epoch 3/100
64/64 [==============================] - 1s 14ms/step - loss: 4.2879e-04
Epoch 4/100
64/64 [==============================] - 1s 14ms/step - loss: 4.6579e-04
Epoch 5/100
64/64 [==============================] - 1s 14ms/step - loss: 4.0886e-04
Epoch 6/100
64/64 [==============================] - 1s 14ms/step - loss: 3.3498e-04
Epoch 7/100
64/64 [==============================] - 1s 15ms/step - loss: 3.2333e-04
Epoch 8/100
64/64 [==============================] - 1s 15ms/step - loss: 3.7809e-04
Epoch 9/100
64/64 [==============================] - 1s 14ms/step - loss: 3.5955e-04
Epoch 10/100
64/64 [==============================] - 1s 14ms/step - loss: 3.2380e-04
Epoch 11/100
64/64 [==============================] - 1s 14ms/step - loss: 3.0329e-04
Epoch 12/100
64/64 [==============================] - 1s 14ms/step - loss: 3.2191e-04
Epoch 13/100
64/64 [==============================] - 1s 14ms/step - loss: 2.8677e-04
Epoch 14/100
64/64 [==============================] - 1s 14ms/step - loss: 2.5705e-04
Epoch 15/100
64/64 [==============================] - 1s 14ms/step - loss: 2.5431e-04
Epoch 16/100
64/64 [==============================] - 1s 14ms/step - loss: 2.9078e-04
Epoch 17/100
64/64 [==============================] - 1s 14ms/step - loss: 2.5680e-04
Epoch 18/100
64/64 [==============================] - 1s 14ms/step - loss: 2.7394e-04
Epoch 19/100
64/64 [==============================] - 1s 14ms/step - loss: 2.6471e-04
Epoch 20/100
64/64 [==============================] - 1s 14ms/step - loss: 2.6454e-04
Epoch 21/100
64/64 [==============================] - 1s 14ms/step - loss: 2.0183e-04
Epoch 22/100
64/64 [==============================] - 1s 14ms/step - loss: 2.2347e-04
Epoch 23/100
64/64 [==============================] - 1s 14ms/step - loss: 2.0584e-04
Epoch 24/100
64/64 [==============================] - 1s 14ms/step - loss: 2.0493e-04
Epoch 25/100
64/64 [==============================] - 1s 14ms/step - loss: 1.8271e-04
Epoch 26/100
64/64 [==============================] - 1s 14ms/step - loss: 1.7624e-04
Epoch 27/100
64/64 [==============================] - 1s 14ms/step - loss: 1.6914e-04
Epoch 28/100
64/64 [==============================] - 1s 14ms/step - loss: 1.5706e-04
Epoch 29/100
64/64 [==============================] - 1s 14ms/step - loss: 1.5758e-04
Epoch 30/100
64/64 [==============================] - 1s 14ms/step - loss: 1.7418e-04
Epoch 31/100
64/64 [==============================] - 1s 14ms/step - loss: 1.7235e-04
Epoch 32/100
64/64 [==============================] - 1s 14ms/step - loss: 1.5608e-04
Epoch 33/100
64/64 [==============================] - 1s 14ms/step - loss: 1.4036e-04
Epoch 34/100
64/64 [==============================] - 1s 14ms/step - loss: 2.0073e-04
Epoch 35/100
64/64 [==============================] - 1s 14ms/step - loss: 1.4431e-04
Epoch 36/100
64/64 [==============================] - 1s 14ms/step - loss: 1.4923e-04
Epoch 37/100
64/64 [==============================] - 1s 14ms/step - loss: 1.3997e-04
Epoch 38/100
64/64 [==============================] - 1s 14ms/step - loss: 1.3732e-04
Epoch 39/100
64/64 [==============================] - 1s 14ms/step - loss: 1.3894e-04
Epoch 40/100
64/64 [==============================] - 1s 14ms/step - loss: 1.4029e-04
Epoch 41/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1931e-04
Epoch 42/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1095e-04
Epoch 43/100
64/64 [==============================] - 1s 14ms/step - loss: 1.5364e-04
Epoch 44/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2021e-04
Epoch 45/100
64/64 [==============================] - 1s 14ms/step - loss: 1.3986e-04
Epoch 46/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1162e-04
Epoch 47/100
64/64 [==============================] - 1s 16ms/step - loss: 1.1788e-04
Epoch 48/100
64/64 [==============================] - 1s 15ms/step - loss: 1.1198e-04
Epoch 49/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1119e-04
Epoch 50/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2146e-04
Epoch 51/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0934e-04
Epoch 52/100
64/64 [==============================] - 1s 14ms/step - loss: 1.3719e-04
Epoch 53/100
64/64 [==============================] - 1s 15ms/step - loss: 1.5263e-04
Epoch 54/100
64/64 [==============================] - 1s 15ms/step - loss: 1.0821e-04
Epoch 55/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0546e-04
Epoch 56/100
64/64 [==============================] - 1s 15ms/step - loss: 1.1087e-04
Epoch 57/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0679e-04
Epoch 58/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0365e-04
Epoch 59/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2739e-04
Epoch 60/100
64/64 [==============================] - 1s 14ms/step - loss: 9.7388e-05
Epoch 61/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0564e-04
Epoch 62/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2045e-04
Epoch 63/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1745e-04
Epoch 64/100
64/64 [==============================] - 1s 14ms/step - loss: 1.1334e-04
Epoch 65/100
64/64 [==============================] - 1s 14ms/step - loss: 9.8233e-05
Epoch 66/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0251e-04
Epoch 67/100
64/64 [==============================] - 1s 14ms/step - loss: 9.1577e-05
Epoch 68/100
64/64 [==============================] - 1s 14ms/step - loss: 1.5062e-04
Epoch 69/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2784e-04
Epoch 70/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2438e-04
Epoch 71/100
64/64 [==============================] - 1s 14ms/step - loss: 9.2123e-05
Epoch 72/100
64/64 [==============================] - 1s 14ms/step - loss: 8.7226e-05
Epoch 73/100
64/64 [==============================] - 1s 14ms/step - loss: 8.9998e-05
Epoch 74/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0895e-04
Epoch 75/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0280e-04
Epoch 76/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0889e-04
Epoch 77/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0240e-04
Epoch 78/100
64/64 [==============================] - 1s 14ms/step - loss: 9.2744e-05
Epoch 79/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0128e-04
Epoch 80/100
64/64 [==============================] - 1s 14ms/step - loss: 8.8985e-05
Epoch 81/100
64/64 [==============================] - 1s 14ms/step - loss: 9.8151e-05
Epoch 82/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0025e-04
Epoch 83/100
64/64 [==============================] - 1s 14ms/step - loss: 1.2269e-04
Epoch 84/100
64/64 [==============================] - 1s 14ms/step - loss: 9.3485e-05
Epoch 85/100
64/64 [==============================] - 1s 14ms/step - loss: 9.5440e-05
Epoch 86/100
64/64 [==============================] - 1s 14ms/step - loss: 8.4447e-05
Epoch 87/100
64/64 [==============================] - 1s 14ms/step - loss: 8.2244e-05
Epoch 88/100
64/64 [==============================] - 1s 14ms/step - loss: 8.3451e-05
Epoch 89/100
64/64 [==============================] - 1s 14ms/step - loss: 8.5823e-05
Epoch 90/100
64/64 [==============================] - 1s 14ms/step - loss: 9.1595e-05
Epoch 91/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0452e-04
Epoch 92/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0908e-04
Epoch 93/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0312e-04
Epoch 94/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0181e-04
Epoch 95/100
64/64 [==============================] - 1s 14ms/step - loss: 1.0090e-04
Epoch 96/100
64/64 [==============================] - 1s 14ms/step - loss: 8.3450e-05
Epoch 97/100
64/64 [==============================] - 1s 15ms/step - loss: 8.9355e-05
Epoch 98/100
64/64 [==============================] - 1s 15ms/step - loss: 8.7632e-05
Epoch 99/100
64/64 [==============================] - 1s 14ms/step - loss: 8.4219e-05
Epoch 100/100
64/64 [==============================] - 1s 14ms/step - loss: 9.7970e-05

Model histroy

In [38]:
history.history['loss'][:10]
Out[38]:
[0.005947669502347708,
 0.000399257056415081,
 0.00042879246757365763,
 0.0004657871031668037,
 0.0004088585264980793,
 0.0003349783073645085,
 0.00032332821865566075,
 0.0003780880942940712,
 0.0003595463349483907,
 0.0003238031640648842]

Prepare validation data for prediction

In [39]:
print(len(new_df))
print(len(valid))
1258
558
In [40]:
test_inputs = new_df[len(new_df) - len(valid) - 60:].values
test_inputs[:10]
Out[40]:
array([[1642.810059],
       [1538.880005],
       [1530.420044],
       [1598.01001],
       [1665.530029],
       [1665.530029],
       [1627.800049],
       [1642.810059],
       [1755.4899899999998],
       [1754.910034]], dtype=object)

Reshape and transform test_inputs

In [41]:
test_inputs = test_inputs.reshape(-1,1)
test_inputs  = scaler.transform(test_inputs)
test_inputs[:10]
Out[41]:
array([[0.35529198],
       [0.31981431],
       [0.31692641],
       [0.33999899],
       [0.36304769],
       [0.36304769],
       [0.35016814],
       [0.35529198],
       [0.39375651],
       [0.39355854]])

Create X_test

In [42]:
X_test = []
for i in range(60, test_inputs.shape[0]):
    X_test.append(test_inputs[i-60:i, 0])

Convert X_test into numpy array

In [43]:
X_test = np.array(X_test)
In [53]:
print(X_test)
print(X_test.shape)
[[[0.35529198]
  [0.31981431]
  [0.31692641]
  ...
  [0.35165989]
  [0.35433956]
  [0.35942927]]

 [[0.31981431]
  [0.31692641]
  [0.33999899]
  ...
  [0.35433956]
  [0.35942927]
  [0.36476812]]

 [[0.31692641]
  [0.33999899]
  [0.36304769]
  ...
  [0.35942927]
  [0.36476812]
  [0.35361246]]

 ...

 [[0.85983038]
  [0.87521205]
  [0.86209699]
  ...
  [0.89498715]
  [0.91395652]
  [0.92075307]]

 [[0.87521205]
  [0.86209699]
  [0.85417059]
  ...
  [0.91395652]
  [0.92075307]
  [0.94563826]]

 [[0.86209699]
  [0.85417059]
  [0.85980647]
  ...
  [0.92075307]
  [0.94563826]
  [0.94809262]]]
(558, 60, 1)
In [45]:
print(X_test.shape)
(558, 60)

Reshape X_test

In [46]:
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
print(X_test.shape)
(558, 60, 1)

Predict X_test data

In [47]:
closing_price = model.predict(X_test)
closing_price[:10]
Out[47]:
array([[0.35466692],
       [0.3603092 ],
       [0.35042325],
       [0.3362159 ],
       [0.36019176],
       [0.37736833],
       [0.34817305],
       [0.3489977 ],
       [0.35752347],
       [0.35195965]], dtype=float32)

Scaler inverse transformation

In [48]:
closing_price = scaler.inverse_transform(closing_price)
closing_price[:10]
Out[48]:
array([[1640.979 ],
       [1657.5078],
       [1628.5474],
       [1586.9277],
       [1657.1638],
       [1707.4817],
       [1621.9556],
       [1624.3713],
       [1649.347 ],
       [1633.0482]], dtype=float32)

Visualize actual and predicted stock price

In [49]:
import matplotlib.pyplot as plt

Actual and predicted stock price for test data

In [50]:
train = new_df[:700]
valid = new_df[700:]
valid['Predictions'] = closing_price
/opt/tljh/user/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [51]:
plt.figure(figsize=(16,8)) 
plt.plot(valid['Close'], color = 'green', label = 'Actual Amazon Inc. Stock Price',ls='--')
plt.plot(valid['Predictions'], color = 'red', label = 'Predicted Amazon Inc. Stock Price',ls='-')
plt.title('Predicted Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')
plt.legend()
Out[51]:
<matplotlib.legend.Legend at 0x7f38e14585d0>

Visualize training and test data

In [52]:
plt.figure(figsize=(16,8)) 
plt.plot(train['Close'], color = 'blue')
plt.plot(valid[['Close','Predictions']])
plt.title('Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')
Out[52]:
Text(0, 0.5, 'Stock Price')

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing