Searching Text in Multiple Files in Python

Searching Text in Multiple Files in Python

In this blog, we will search some text or string in the multiple files.

In [1]:
text = input("Please enter text: ")
Please enter text: machine
In [2]:
print(f"You have entered \"{text}\" word to search.")
You have entered "machine" word to search.

Get the current directory

OS module in Python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.

In [3]:
import os

Get the current working directory

In [4]:
current_path = os.getcwd()
current_path
Out[4]:
'E:\\jupyter-notebook-workspace'

Declare the path, in which you want to search text

In [5]:
path = 'G:/data/input'
path
Out[5]:
'G:/data/input'

Create a function and list the directory

Create a function and change the directory in which you want to search text. After that list the directory and print the files.

os.chdir(path) - Change the current working directory to specified path.

os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory. If a file is removed from or added to the directory during the call of this function, whether a name for that file be included is unspecified.

In [6]:
def searchText(path):
    
    os.chdir(path)
    files = os.listdir()
    print(files)
    pass

searchText(path)
['001-001-intent-recognition-with-bert.ipynb', '002-001-deploy-machine-learning-model-with-flask-on-heroku.ipynb', 'add-signup-form-in-react-native-mobile-app-part-7.ipynb', 'build-the-neural-network-with-pytorch.ipynb', 'call-graphql-api-in-react-native-mobile-app-part-5.ipynb', 'capture-date-phone-and-email-from-text-with-regular-expression-in-python.ipynb', 'classify-images-of-clothing-with-neural network.ipynb', 'color-and-shape.ipynb', 'samples']

We got the list of files and folders.

In [8]:
def searchText(path):
    
    os.chdir(path)
    files = os.listdir()
    #print(files)
    for file_name in files:
        print(file_name)
    pass

searchText(path)
001-001-intent-recognition-with-bert.ipynb
002-001-deploy-machine-learning-model-with-flask-on-heroku.ipynb
add-signup-form-in-react-native-mobile-app-part-7.ipynb
build-the-neural-network-with-pytorch.ipynb
call-graphql-api-in-react-native-mobile-app-part-5.ipynb
capture-date-phone-and-email-from-text-with-regular-expression-in-python.ipynb
classify-images-of-clothing-with-neural network.ipynb
color-and-shape.ipynb
samples

Get the absolute path of the file name

In [9]:
def searchText(path):
    
    os.chdir(path)
    files = os.listdir()
    #print(files)
    for file_name in files:
        #print(file_name)
        
        abs_path = os.path.abspath(file_name)
        print("Absolute path of the file:", abs_path)
    
    pass

searchText(path)
Absolute path of the file: G:\data\input\001-001-intent-recognition-with-bert.ipynb
Absolute path of the file: G:\data\input\002-001-deploy-machine-learning-model-with-flask-on-heroku.ipynb
Absolute path of the file: G:\data\input\add-signup-form-in-react-native-mobile-app-part-7.ipynb
Absolute path of the file: G:\data\input\build-the-neural-network-with-pytorch.ipynb
Absolute path of the file: G:\data\input\call-graphql-api-in-react-native-mobile-app-part-5.ipynb
Absolute path of the file: G:\data\input\capture-date-phone-and-email-from-text-with-regular-expression-in-python.ipynb
Absolute path of the file: G:\data\input\classify-images-of-clothing-with-neural network.ipynb
Absolute path of the file: G:\data\input\color-and-shape.ipynb
Absolute path of the file: G:\data\input\samples

You can see, we have samples folder. In that folder also, we have to list the files. For that we need to do recursive search.

If it is directory, search inside that

In [10]:
def searchText(path):
    
    os.chdir(path)
    files = os.listdir()
    #print(files)
    for file_name in files:
        #print(file_name)
        abs_path = os.path.abspath(file_name)
        print("Absolute path of the file: ", abs_path)
        
        if os.path.isdir(abs_path):
            searchText(abs_path)

    pass

searchText(path)
Absolute path of the file:  G:\data\input\001-001-intent-recognition-with-bert.ipynb
Absolute path of the file:  G:\data\input\002-001-deploy-machine-learning-model-with-flask-on-heroku.ipynb
Absolute path of the file:  G:\data\input\add-signup-form-in-react-native-mobile-app-part-7.ipynb
Absolute path of the file:  G:\data\input\build-the-neural-network-with-pytorch.ipynb
Absolute path of the file:  G:\data\input\call-graphql-api-in-react-native-mobile-app-part-5.ipynb
Absolute path of the file:  G:\data\input\capture-date-phone-and-email-from-text-with-regular-expression-in-python.ipynb
Absolute path of the file:  G:\data\input\classify-images-of-clothing-with-neural network.ipynb
Absolute path of the file:  G:\data\input\color-and-shape.ipynb
Absolute path of the file:  G:\data\input\samples
Absolute path of the file:  G:\data\input\samples\alice-in-wonderland.txt
Absolute path of the file:  G:\data\input\samples\ramayana.txt

Open each file, read and search the text

In [12]:
def searchText(path):
    
    os.chdir(path)
    files = os.listdir()
    #print(files)
    for file_name in files:
        #print(file_name)
        abs_path = os.path.abspath(file_name)
        
        if os.path.isdir(abs_path):
            searchText(abs_path)
            
        if os.path.isfile(abs_path):
             with open(file_name, 'r', encoding='utf-8') as f:
                if text in f.read():
                    final_path = os.path.abspath(file_name)
                    print(text + " word found in this path " + final_path)
                else:
                    print("No match found in " + abs_path)
    pass

searchText(path)
No match found in G:\data\input\001-001-intent-recognition-with-bert.ipynb
machine word found in this path G:\data\input\002-001-deploy-machine-learning-model-with-flask-on-heroku.ipynb
No match found in G:\data\input\add-signup-form-in-react-native-mobile-app-part-7.ipynb
No match found in G:\data\input\build-the-neural-network-with-pytorch.ipynb
machine word found in this path G:\data\input\call-graphql-api-in-react-native-mobile-app-part-5.ipynb
No match found in G:\data\input\capture-date-phone-and-email-from-text-with-regular-expression-in-python.ipynb
machine word found in this path G:\data\input\classify-images-of-clothing-with-neural network.ipynb
No match found in G:\data\input\color-and-shape.ipynb
No match found in G:\data\input\samples\alice-in-wonderland.txt
No match found in G:\data\input\samples\ramayana.txt
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing