Image Classification with PyTorch

Image Classification with PyTorch

In this blog, we will play with cats and dogs datasets. We will build neural network step by step in pytorch, then train the model and predict the image.

What is PyTorch?

PyTorch is an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR).

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It is an open source machine learning framework.

There are many reasons, but prime among them has to be the surge in graphical processing units (GPUs) performance and their increasing affordability. Designed originally for gaming, GPUs need to perform countless millions of matrix operations per second.

PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers. PyTorch Tensors are similar to NumPy Arrays, but can also be operated on a CUDA-capable Nvidia GPU.

Import libraries

In [1]:
import torch
import torchvision
from torchvision import transforms
import os

Import libraries for PIL

We have to edit the file, for that we need root access. It encounter error while running some pytorch which was maybe using the PIL library. To fix error, we need to add following lines:

In [2]:
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

Check if GPU (CUDA) is available

In [3]:
use_cuda = torch.cuda.is_available()
use_cuda 
Out[3]:
False

View training, validation and test data

We have data like following structure:

In [4]:
base_dir = '/home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/'
In [5]:
train_dir = os.path.join(base_dir, 'train/')
validation_dir = os.path.join(base_dir, 'validation/')
test_dir = os.path.join(base_dir, 'test/')

print(f"""
train_dir = {train_dir}
validation_dir = {validation_dir}
test_dir = {validation_dir}
""")

train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')


test_cats_dir = os.path.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')

print(f"""
train_cats_dir = {train_cats_dir}
train_dogs_dir = {train_dogs_dir}

validation_cats_dir = {validation_cats_dir}
validation_dogs_dir = {validation_dogs_dir}

test_cats_dir = {test_cats_dir}
test_dogs_dir = {test_dogs_dir}

""")
train_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/
validation_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/
test_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/


train_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/cats
train_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/dogs

validation_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/cats
validation_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/dogs

test_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/test/cats
test_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/test/dogs


In [6]:
print('total training cat images:', len(os.listdir(train_cats_dir)))
total training cat images: 1500
In [7]:
print('total training dog images:', len(os.listdir(train_dogs_dir)))
total training dog images: 1000
In [8]:
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
total validation cat images: 500
In [9]:
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
total validation dog images: 501
In [10]:
print('total test cat images:', len(os.listdir(test_cats_dir)))
total test cat images: 500
In [11]:
print('total test dog images:', len(os.listdir(test_dogs_dir)))
total test dog images: 500

Check image directory

In [12]:
def checkImage(path):
    try:
        im = Image.open(path)
        return True
    except:
        return False

What is torchvision?

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Transform the images

torchvision.transforms

Transforms are common image transformations. They can be chained together using Compose.

class

torchvision.transforms.Normalize(mean, std, inplace=False)

Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]

We are resizing every image to the same resolution of 64 × 64. Then convert the images to a tensor, and finally, we normalize the tensor around a specific set of mean and standard deviation points.

We can see that both parameters are "Sequences for each channel". Color images have three channels (red, green, blue), therefore you need three parameters to normalize each channel. The first list [0.485, 0.456, 0.406] is the mean for all three channels and the second [0.229, 0.224, 0.225] is the standard deviation for all three channels.

Normalizing is important because a lot of multiplication will be happening as the input passes through the layers of the neural network. So We are converting values between 0 and 1.

In [13]:
img_transforms = transforms.Compose([
    transforms.Resize((64,64)),    
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                    std=[0.229, 0.224, 0.225] )
    ])

torchvision.datasets.ImageFolder class

torchvision.datasets.ImageFolder(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = <function default_loader>, is_valid_file: Optional[Callable[[str], bool]] = None)

A generic data loader where the images are arranged in this way by default:

root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
In [14]:
train_data = torchvision.datasets.ImageFolder(root = train_dir, transform = img_transforms, is_valid_file = checkImage)
In [15]:
val_data = torchvision.datasets.ImageFolder(root = validation_dir, transform = img_transforms, is_valid_file = checkImage)
In [16]:
test_data = torchvision.datasets.ImageFolder(root = test_dir, transform = img_transforms, is_valid_file = checkImage) 

Create a Dataloader

A data loader is what feeds data from the dataset into the network.

In [17]:
#By default, PyTorch’s data loaders are set to a batch_size of 1.
BATCH_SIZE = 64
In [18]:
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size = BATCH_SIZE)
val_data_loader  = torch.utils.data.DataLoader(val_data, batch_size = BATCH_SIZE) 
test_data_loader  = torch.utils.data.DataLoader(test_data, batch_size = BATCH_SIZE) 
In [19]:
sample = next(iter(train_data_loader))
imgs, lbls = sample
In [20]:
lbls
Out[20]:
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [21]:
imgs[0]
Out[21]:
tensor([[[ 1.4269,  1.5297,  1.6667,  ...,  2.1290,  2.1119,  2.0434],
         [ 1.4269,  1.5297,  1.6667,  ...,  2.1290,  2.1119,  2.0605],
         [ 1.4440,  1.5468,  1.6667,  ...,  2.1119,  2.1119,  2.0948],
         ...,
         [ 0.6734,  0.7248,  0.7933,  ..., -2.0665, -2.0665, -2.0665],
         [ 0.6221,  0.6734,  0.7419,  ..., -2.0665, -2.0665, -2.0665],
         [ 0.5364,  0.6049,  0.6906,  ..., -2.0837, -2.0837, -2.0837]],

        [[ 0.9055,  1.0105,  1.0980,  ...,  1.7283,  1.6232,  1.5357],
         [ 0.9055,  0.9930,  1.0980,  ...,  1.7633,  1.6758,  1.5882],
         [ 0.8880,  0.9755,  1.0805,  ...,  1.7808,  1.7283,  1.6583],
         ...,
         [ 0.2227,  0.2752,  0.3102,  ..., -1.9657, -1.9657, -1.9832],
         [ 0.1702,  0.2227,  0.2577,  ..., -1.9657, -1.9657, -1.9832],
         [ 0.1352,  0.1877,  0.2402,  ..., -2.0007, -2.0007, -2.0007]],

        [[-0.2184, -0.1312, -0.0441,  ...,  0.6008,  0.4439,  0.3393],
         [-0.2184, -0.1312, -0.0441,  ...,  0.6705,  0.5311,  0.4091],
         [-0.2010, -0.1138, -0.0441,  ...,  0.6705,  0.6008,  0.4962],
         ...,
         [-0.7936, -0.7413, -0.7064,  ..., -1.8044, -1.8044, -1.8044],
         [-0.8284, -0.7936, -0.7587,  ..., -1.8044, -1.8044, -1.8044],
         [-0.8284, -0.7761, -0.7761,  ..., -1.8044, -1.8044, -1.8044]]])

Create the Neural Networks

In [22]:
import torch.nn as nn
import torch.nn.functional as F

Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.

We do any setup required in init(), in this case calling our superclass constructor and the three fully connected layers (called Linear in PyTorch, as opposed to Dense in Keras). The forward() method describes how data flows through the network in both training and making predictions (inference).

First, we have to convert the 3D tensor (x and y plus three-channel color information —red, green, blue) in an image, remember!—into a 1D tensor so that it can be fed into the first Linear layer, and we do that using the view(). From there, you can see that we apply the layers and the activation functions in order, finally returning the softmax output to give us our prediction for that image.

If you want to create a recurrent network, simply use the same Linear layer multiple times, without having to think about sharing weights. Input size will be 64 64 3.

In [23]:
class MyNeuralNetwork(nn.Module):
    def __init__(self, input_size = 12288):
        super(MyNeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, 84)
        self.fc2 = nn.Linear(84, 50)
        self.fc3 = nn.Linear(50,2)
        pass
    
    def forward(self, x):
        x = x.view(-1, 12288)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        pass
    
model = MyNeuralNetwork()
In [24]:
print(model)
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Define a loss function

In [25]:
loss_function = torch.nn.CrossEntropyLoss()
loss_function
Out[25]:
CrossEntropyLoss()

Create an optimizer

The weights are modified using a function called Optimization Function.

torch.optim

is a package implementing various optimization algorithms. To use torch.optim we have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients.

To construct an Optimizer you have to give it an iterable containing the parameters to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.

In [26]:
import torch.optim as optim
In [27]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

Copy the model to GPU if available

In [28]:
if torch.cuda.is_available():
    device = torch.device("cuda") 
else:
    device = torch.device("cpu")

model.to(device)
Out[28]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Train the model

Step 1: Create a function called train and loop through epoch

In [29]:
def train(start_epochs, n_epochs, model):
    for epoch in range(start_epochs, n_epochs + 1):
        print(f"epoch = {epoch}")
        
        pass
    
    # return trained model
    return model
    pass


train(0, 2, model)
epoch = 0
epoch = 1
epoch = 2
Out[29]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 2: for each epoch, set loss params

We have to initialize training and validation loss as zero. Also set the model in training mode.

In [30]:
def train(start_epochs, n_epochs, model):
    for epoch in range(start_epochs, n_epochs + 1):
        
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        #Set the model in training mode
        model.train()
        
        print(f"epoch = {epoch}")
        
    
    # return trained model
    return model

    pass


train(0, 2, model)
epoch = 0
epoch = 1
epoch = 2
Out[30]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 3: Iterate the train_loader in each epoch

In [31]:
def train(start_epochs, n_epochs, model, train_loader):
    for epoch in range(start_epochs, n_epochs + 1):
        
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        #Set the model in training mode
        model.train()
        
        print(f"batch started: ")
        for batch_idx, (data, target) in enumerate(train_loader):
            #print(f"batch_idx: {batch_idx}")
            if batch_idx % 50 == 0:
                print(f"{batch_idx}, ", end = "")
            pass
            
        print(f"epoch = {epoch}")
        
    
    # return trained model
    return model

    pass


train(0, 2, model, train_data_loader)
batch started: 
0, epoch = 0
batch started: 
0, epoch = 1
batch started: 
0, epoch = 2
Out[31]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 4: Compute training params for the batches for training data

Create a new function called train_process_batches and compute the training parama for the batches for training data.

In [32]:
def train_process_batches(model, train_loader, optimizer, loss_function, verbose = True ):
    train_loss = 0.0
    
    model.train()
    if verbose:
        print(f"Training data batch process: ", end = "")
        
    for batch_idx, (data, target) in enumerate(train_loader):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
            
        #we need to set the gradients to zero before starting to do backpropragation 
        #because PyTorch accumulates the gradients on subsequent backward passes
        optimizer.zero_grad()
        
        #forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        
        #calculate the batch loss
        loss = loss_function(output, target)
        
        #backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        
        # perform a single optimization step (parameter update)
        optimizer.step()
        
        ## calculate train_loss
        train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))

        if batch_idx % 50 == 0:
            if verbose:
                print(f"\t{batch_idx}, {train_loss}", end = "\n")
            else:
                print(f"\t{batch_idx}, ", end = "")

        pass
    
    return train_loss
    pass

Step 5: Call the train_process_batches() function in train() function

In [33]:
def train(start_epochs, n_epochs, model, train_loader):
    for epoch in range(start_epochs, n_epochs + 1):
        print(f"Epoch: {epoch}, ", end = "\n")

        # initialize variables to monitor training and validation loss
        valid_loss = 0.0
        
        #train model
        train_loss = train_process_batches(model, train_loader, optimizer, loss_function)
        
        print(f"\ntrain_loss = {train_loss}")
    # return trained model
    return model

train(0, 1, model, train_data_loader)
Epoch: 0, 
Training data batch process: 	0, 0.5668985843658447

train_loss = 3.087343454360962
Epoch: 1, 
Training data batch process: 	0, 9.285696029663086

train_loss = 2.1793875694274902
Out[33]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 6: Compute validation params for the batches for validation data

In [34]:
def eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True ):
    valid_loss = 0.0

    model.eval()
    if verbose:
        print(f"Test data batch process: ", end = "")
        
    for batch_idx, (data, target) in enumerate(val_loader):

        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()

        ## update the average validation loss
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)

        # calculate the batch loss
        loss = loss_function(output, target)

        # update average validation loss 
        valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
        if batch_idx % 20 == 0:
            if verbose:
                print(f"\t{batch_idx}, {valid_loss}", end = "\n")
            else:
                print(f"\t{batch_idx}, ", end = "")

        pass
    
    return valid_loss
    pass
    

Step 7: Finally call eval_process_batches() function in train() funtion

In [35]:
def train(start_epochs, n_epochs, model, train_loader, val_loader):
    for epoch in range(start_epochs, n_epochs+1):
        print(f"Epoch: {epoch}, ", end = "\n")

        # initialize variables to monitor training and validation loss
        valid_loss = 0.0
        
        #train model
        train_loss = train_process_batches(model, train_loader, optimizer, loss_function, verbose = False)
        valid_loss = eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True)
        
          
        print(f"\ntrain_loss = {train_loss}")
        print(f"\nvalid_loss = {valid_loss}")
        
    # return trained model
    return model

train(0, 5, model, train_data_loader, val_data_loader)
Epoch: 0, 
	0, Test data batch process: 	0, 0.6833963990211487

train_loss = 1.912941336631775

valid_loss = 0.6985856890678406
Epoch: 1, 
	0, Test data batch process: 	0, 0.6675243377685547

train_loss = 0.7882331609725952

valid_loss = 0.6882128119468689
Epoch: 2, 
	0, Test data batch process: 	0, 0.6130846738815308

train_loss = 0.7254152297973633

valid_loss = 0.6899538040161133
Epoch: 3, 
	0, Test data batch process: 	0, 0.6083475947380066

train_loss = 0.6715722680091858

valid_loss = 0.6884120106697083
Epoch: 4, 
	0, Test data batch process: 	0, 0.5992617011070251

train_loss = 0.6625670790672302

valid_loss = 0.6871464252471924
Epoch: 5, 
	0, Test data batch process: 	0, 0.5854282975196838

train_loss = 0.6569139361381531

valid_loss = 0.6850653290748596
Out[35]:
MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Predict the test data

Open any test image with Image library

In [36]:
img = Image.open(test_dir + "dogs/dog.1500.jpg") 

Transform the image

torch.unsqueeze(input, dim) → Tensor

Returns a new tensor with a dimension of size one inserted at the specified position.

Example:
x = torch.tensor([1, 2, 3, 4])

torch.unsqueeze(x, 0)
Output: tensor([[ 1,  2,  3,  4]])

torch.unsqueeze(x, 1)
Output:
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])
In [37]:
img = img_transforms(img).to(device)
img = torch.unsqueeze(img, 0)

Predict

In [38]:
model.eval()
prediction = F.softmax(model(img), dim = 1)
prediction
Out[38]:
tensor([[0.2418, 0.7582]], grad_fn=<SoftmaxBackward>)

PyTorch provides the argmax() function, which returns the index of the highest value of the tensor.

In [39]:
prediction = prediction.argmax()
prediction
Out[39]:
tensor(1)

Check the predicted label

In [40]:
labels = ['cats','dogs']

print(labels[prediction]) 
dogs
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing