Image Classification with PyTorch

In this blog, we will play with cats and dogs datasets. We will build neural network step by step in pytorch, then train the model and predict the image.

What is PyTorch?

PyTorch is an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR).

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It is an open source machine learning framework.

There are many reasons, but prime among them has to be the surge in graphical processing units (GPUs) performance and their increasing affordability. Designed originally for gaming, GPUs need to perform countless millions of matrix operations per second.

PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers. PyTorch Tensors are similar to NumPy Arrays, but can also be operated on a CUDA-capable Nvidia GPU.

Import libraries

import torch
import torchvision
from torchvision import transforms
import os

Import libraries for PIL

We have to edit the file, for that we need root access. It encounter error while running some pytorch which was maybe using the PIL library. To fix error, we need to add following lines:

from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

Check if GPU (CUDA) is available

use_cuda = torch.cuda.is_available()
use_cuda

False

View training, validation and test data

We have data like following structure:

No description has been provided for this image

base_dir = '/home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/'

train_dir = os.path.join(base_dir, 'train/')
validation_dir = os.path.join(base_dir, 'validation/')
test_dir = os.path.join(base_dir, 'test/')

print(f"""
train_dir = {train_dir}
validation_dir = {validation_dir}
test_dir = {validation_dir}
""")

train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')


test_cats_dir = os.path.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')

print(f"""
train_cats_dir = {train_cats_dir}
train_dogs_dir = {train_dogs_dir}

validation_cats_dir = {validation_cats_dir}
validation_dogs_dir = {validation_dogs_dir}

test_cats_dir = {test_cats_dir}
test_dogs_dir = {test_dogs_dir}

""")

train_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/
validation_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/
test_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/


train_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/cats
train_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/train/dogs

validation_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/cats
validation_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/validation/dogs

test_cats_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/test/cats
test_dogs_dir = /home/jupyter-thakur/xv-shared-folders/training/cats_and_dogs_small/test/dogs

print('total training cat images:', len(os.listdir(train_cats_dir)))

total training cat images: 1500

print('total training dog images:', len(os.listdir(train_dogs_dir)))

total training dog images: 1000

print('total validation cat images:', len(os.listdir(validation_cats_dir)))

total validation cat images: 500

print('total validation dog images:', len(os.listdir(validation_dogs_dir)))

total validation dog images: 501

print('total test cat images:', len(os.listdir(test_cats_dir)))

total test cat images: 500

print('total test dog images:', len(os.listdir(test_dogs_dir)))

total test dog images: 500

Check image directory

def checkImage(path):
    try:
        im = Image.open(path)
        return True
    except:
        return False

What is torchvision?

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Transform the images

torchvision.transforms

Transforms are common image transformations. They can be chained together using Compose.

class

torchvision.transforms.Normalize(mean, std, inplace=False)

Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]

We are resizing every image to the same resolution of 64 × 64. Then convert the images to a tensor, and finally, we normalize the tensor around a specific set of mean and standard deviation points.

We can see that both parameters are "Sequences for each channel". Color images have three channels (red, green, blue), therefore you need three parameters to normalize each channel. The first list [0.485, 0.456, 0.406] is the mean for all three channels and the second [0.229, 0.224, 0.225] is the standard deviation for all three channels.

Normalizing is important because a lot of multiplication will be happening as the input passes through the layers of the neural network. So We are converting values between 0 and 1.

img_transforms = transforms.Compose([
    transforms.Resize((64,64)),    
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                    std=[0.229, 0.224, 0.225] )
    ])

torchvision.datasets.ImageFolder class

torchvision.datasets.ImageFolder(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = , is_valid_file: Optional[Callable[[str], bool]] = None)

A generic data loader where the images are arranged in this way by default:

root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png

train_data = torchvision.datasets.ImageFolder(root = train_dir, transform = img_transforms, is_valid_file = checkImage)

val_data = torchvision.datasets.ImageFolder(root = validation_dir, transform = img_transforms, is_valid_file = checkImage)

test_data = torchvision.datasets.ImageFolder(root = test_dir, transform = img_transforms, is_valid_file = checkImage)

Create a Dataloader

A data loader is what feeds data from the dataset into the network.

#By default, PyTorch’s data loaders are set to a batch_size of 1.
BATCH_SIZE = 64

train_data_loader = torch.utils.data.DataLoader(train_data, batch_size = BATCH_SIZE)
val_data_loader  = torch.utils.data.DataLoader(val_data, batch_size = BATCH_SIZE) 
test_data_loader  = torch.utils.data.DataLoader(test_data, batch_size = BATCH_SIZE)

sample = next(iter(train_data_loader))
imgs, lbls = sample

lbls

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

imgs[0]

tensor([[[ 1.4269,  1.5297,  1.6667,  ...,  2.1290,  2.1119,  2.0434],
         [ 1.4269,  1.5297,  1.6667,  ...,  2.1290,  2.1119,  2.0605],
         [ 1.4440,  1.5468,  1.6667,  ...,  2.1119,  2.1119,  2.0948],
         ...,
         [ 0.6734,  0.7248,  0.7933,  ..., -2.0665, -2.0665, -2.0665],
         [ 0.6221,  0.6734,  0.7419,  ..., -2.0665, -2.0665, -2.0665],
         [ 0.5364,  0.6049,  0.6906,  ..., -2.0837, -2.0837, -2.0837]],

        [[ 0.9055,  1.0105,  1.0980,  ...,  1.7283,  1.6232,  1.5357],
         [ 0.9055,  0.9930,  1.0980,  ...,  1.7633,  1.6758,  1.5882],
         [ 0.8880,  0.9755,  1.0805,  ...,  1.7808,  1.7283,  1.6583],
         ...,
         [ 0.2227,  0.2752,  0.3102,  ..., -1.9657, -1.9657, -1.9832],
         [ 0.1702,  0.2227,  0.2577,  ..., -1.9657, -1.9657, -1.9832],
         [ 0.1352,  0.1877,  0.2402,  ..., -2.0007, -2.0007, -2.0007]],

        [[-0.2184, -0.1312, -0.0441,  ...,  0.6008,  0.4439,  0.3393],
         [-0.2184, -0.1312, -0.0441,  ...,  0.6705,  0.5311,  0.4091],
         [-0.2010, -0.1138, -0.0441,  ...,  0.6705,  0.6008,  0.4962],
         ...,
         [-0.7936, -0.7413, -0.7064,  ..., -1.8044, -1.8044, -1.8044],
         [-0.8284, -0.7936, -0.7587,  ..., -1.8044, -1.8044, -1.8044],
         [-0.8284, -0.7761, -0.7761,  ..., -1.8044, -1.8044, -1.8044]]])

Create the Neural Networks

import torch.nn as nn
import torch.nn.functional as F

Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.

We do any setup required in init(), in this case calling our superclass constructor and the three fully connected layers (called Linear in PyTorch, as opposed to Dense in Keras). The forward() method describes how data flows through the network in both training and making predictions (inference).

First, we have to convert the 3D tensor (x and y plus three-channel color information —red, green, blue) in an image, remember!—into a 1D tensor so that it can be fed into the first Linear layer, and we do that using the view(). From there, you can see that we apply the layers and the activation functions in order, finally returning the softmax output to give us our prediction for that image.

If you want to create a recurrent network, simply use the same Linear layer multiple times, without having to think about sharing weights. Input size will be 64 * 64 * 3.

class MyNeuralNetwork(nn.Module):
    def __init__(self, input_size = 12288):
        super(MyNeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, 84)
        self.fc2 = nn.Linear(84, 50)
        self.fc3 = nn.Linear(50,2)
        pass
    
    def forward(self, x):
        x = x.view(-1, 12288)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        pass
    
model = MyNeuralNetwork()

print(model)

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Define a loss function

loss_function = torch.nn.CrossEntropyLoss()
loss_function

CrossEntropyLoss()

Create an optimizer

The weights are modified using a function called Optimization Function.

torch.optim

is a package implementing various optimization algorithms. To use torch.optim we have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients.

To construct an Optimizer you have to give it an iterable containing the parameters to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.001)

Copy the model to GPU if available

if torch.cuda.is_available():
    device = torch.device("cuda") 
else:
    device = torch.device("cpu")

model.to(device)

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Train the model

Step 1: Create a function called train and loop through epoch

def train(start_epochs, n_epochs, model):
    for epoch in range(start_epochs, n_epochs + 1):
        print(f"epoch = {epoch}")
        
        pass
    
    # return trained model
    return model
    pass


train(0, 2, model)

epoch = 0
epoch = 1
epoch = 2

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 2: for each epoch, set loss params

We have to initialize training and validation loss as zero. Also set the model in training mode.

def train(start_epochs, n_epochs, model):
    for epoch in range(start_epochs, n_epochs + 1):
        
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        #Set the model in training mode
        model.train()
        
        print(f"epoch = {epoch}")
        
    
    # return trained model
    return model

    pass


train(0, 2, model)

epoch = 0
epoch = 1
epoch = 2

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 3: Iterate the train_loader in each epoch

def train(start_epochs, n_epochs, model, train_loader):
    for epoch in range(start_epochs, n_epochs + 1):
        
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        #Set the model in training mode
        model.train()
        
        print(f"batch started: ")
        for batch_idx, (data, target) in enumerate(train_loader):
            #print(f"batch_idx: {batch_idx}")
            if batch_idx % 50 == 0:
                print(f"{batch_idx}, ", end = "")
            pass
            
        print(f"epoch = {epoch}")
        
    
    # return trained model
    return model

    pass


train(0, 2, model, train_data_loader)

batch started: 
0, epoch = 0
batch started: 
0, epoch = 1
batch started: 
0, epoch = 2

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 4: Compute training params for the batches for training data

Create a new function called train_process_batches and compute the training parama for the batches for training data.

def train_process_batches(model, train_loader, optimizer, loss_function, verbose = True ):
    train_loss = 0.0
    
    model.train()
    if verbose:
        print(f"Training data batch process: ", end = "")
        
    for batch_idx, (data, target) in enumerate(train_loader):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
            
        #we need to set the gradients to zero before starting to do backpropragation 
        #because PyTorch accumulates the gradients on subsequent backward passes
        optimizer.zero_grad()
        
        #forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        
        #calculate the batch loss
        loss = loss_function(output, target)
        
        #backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        
        # perform a single optimization step (parameter update)
        optimizer.step()
        
        ## calculate train_loss
        train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))

        if batch_idx % 50 == 0:
            if verbose:
                print(f"\t{batch_idx}, {train_loss}", end = "\n")
            else:
                print(f"\t{batch_idx}, ", end = "")

        pass
    
    return train_loss
    pass

Step 5: Call the train_process_batches() function in train() function

def train(start_epochs, n_epochs, model, train_loader):
    for epoch in range(start_epochs, n_epochs + 1):
        print(f"Epoch: {epoch}, ", end = "\n")

        # initialize variables to monitor training and validation loss
        valid_loss = 0.0
        
        #train model
        train_loss = train_process_batches(model, train_loader, optimizer, loss_function)
        
        print(f"\ntrain_loss = {train_loss}")
    # return trained model
    return model

train(0, 1, model, train_data_loader)

Epoch: 0, 
Training data batch process: 	0, 0.5668985843658447

train_loss = 3.087343454360962
Epoch: 1, 
Training data batch process: 	0, 9.285696029663086

train_loss = 2.1793875694274902

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Step 6: Compute validation params for the batches for validation data

def eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True ):
    valid_loss = 0.0

    model.eval()
    if verbose:
        print(f"Test data batch process: ", end = "")
        
    for batch_idx, (data, target) in enumerate(val_loader):

        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()

        ## update the average validation loss
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)

        # calculate the batch loss
        loss = loss_function(output, target)

        # update average validation loss 
        valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
        if batch_idx % 20 == 0:
            if verbose:
                print(f"\t{batch_idx}, {valid_loss}", end = "\n")
            else:
                print(f"\t{batch_idx}, ", end = "")

        pass
    
    return valid_loss
    pass

Step 7: Finally call eval_process_batches() function in train() funtion

def train(start_epochs, n_epochs, model, train_loader, val_loader):
    for epoch in range(start_epochs, n_epochs+1):
        print(f"Epoch: {epoch}, ", end = "\n")

        # initialize variables to monitor training and validation loss
        valid_loss = 0.0
        
        #train model
        train_loss = train_process_batches(model, train_loader, optimizer, loss_function, verbose = False)
        valid_loss = eval_process_batches(model, val_loader, optimizer, loss_function, verbose = True)
        
          
        print(f"\ntrain_loss = {train_loss}")
        print(f"\nvalid_loss = {valid_loss}")
        
    # return trained model
    return model

train(0, 5, model, train_data_loader, val_data_loader)

Epoch: 0, 
	0, Test data batch process: 	0, 0.6833963990211487

train_loss = 1.912941336631775

valid_loss = 0.6985856890678406
Epoch: 1, 
	0, Test data batch process: 	0, 0.6675243377685547

train_loss = 0.7882331609725952

valid_loss = 0.6882128119468689
Epoch: 2, 
	0, Test data batch process: 	0, 0.6130846738815308

train_loss = 0.7254152297973633

valid_loss = 0.6899538040161133
Epoch: 3, 
	0, Test data batch process: 	0, 0.6083475947380066

train_loss = 0.6715722680091858

valid_loss = 0.6884120106697083
Epoch: 4, 
	0, Test data batch process: 	0, 0.5992617011070251

train_loss = 0.6625670790672302

valid_loss = 0.6871464252471924
Epoch: 5, 
	0, Test data batch process: 	0, 0.5854282975196838

train_loss = 0.6569139361381531

valid_loss = 0.6850653290748596

MyNeuralNetwork(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

Predict the test data

Open any test image with Image library

img = Image.open(test_dir + "dogs/dog.1500.jpg")

Transform the image

torch.unsqueeze(input, dim) → Tensor

Returns a new tensor with a dimension of size one inserted at the specified position.

Example:
x = torch.tensor([1, 2, 3, 4])

torch.unsqueeze(x, 0)
Output: tensor([[ 1,  2,  3,  4]])

torch.unsqueeze(x, 1)
Output:
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])

img = img_transforms(img).to(device)
img = torch.unsqueeze(img, 0)

Predict

model.eval()
prediction = F.softmax(model(img), dim = 1)
prediction

tensor([[0.2418, 0.7582]], grad_fn=<SoftmaxBackward>)

PyTorch provides the argmax() function, which returns the index of the highest value of the tensor.

prediction = prediction.argmax()
prediction

tensor(1)

Check the predicted label

labels = ['cats','dogs']

print(labels[prediction])

dogs

Image Classification with PyTorch

Image Classification with PyTorch

What is PyTorch?

Import libraries

Import libraries for PIL

Check if GPU (CUDA) is available

View training, validation and test data

Check image directory

What is torchvision?

Transform the images

torchvision.transforms

class

torchvision.transforms.Normalize(mean, std, inplace=False)

torchvision.datasets.ImageFolder class

torchvision.datasets.ImageFolder(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[[str], Any] = , is_valid_file: Optional[Callable[[str], bool]] = None)

Create a Dataloader

Create the Neural Networks

Define a loss function

Create an optimizer

torch.optim

Copy the model to GPU if available

Train the model

Step 1: Create a function called train and loop through epoch

Step 2: for each epoch, set loss params

Step 3: Iterate the train_loader in each epoch

Step 4: Compute training params for the batches for training data

Step 5: Call the train_process_batches() function in train() function

Step 6: Compute validation params for the batches for validation data

Step 7: Finally call eval_process_batches() function in train() funtion

Predict the test data

Open any test image with Image library

Transform the image

Predict

Check the predicted label

kindergarten

Python for kids

Fourier series

Linear Equations

Geometry

Laplace

Vectors

Differential equations

Functions

Jacobian

Lagrangian

Waves

Electromagnetism

Optics

Quantum mechanics concepts

Theory of relativity

Kinematics

Thermodynamics

Formulae

A level physics

Chemistry

English

Geography

Animation

Plotting

SVG

Python

Machine Learning

TensorFlow

PySpark

PyTorch

Natural Language Processing

Others