Capture Date, Phone and Email From Text With Regular Expression in Python

Capture Date, Phone and Email From Text With Regular Expression in Python

In this blog we will capture date, phone number and email from text with the help of regular expression.

If you are new to regular expression, you can start with following blog:

Regular Expression Operations in Python

Import Module

In [1]:
import re

Capture Date from text

In [2]:
text = "Kaifi Azmi (born Athar Husain Rizvi; 14 January 1919 – 10 May 2002) was an Indian Urdu poet. \
He is remembered as the one who brought Urdu literature to Indian motion pictures. Shaukat Kaifi was born into a \
Shia Muslim family of Uttar Pradesh migrants in Hyderabad State. She grew up in Aurangabad, India. At a young age, \
she fell in love and married the Urdu poet Kaifi Azmi.Shaukat Kaifi (21 October 1926 – 22 November 2019) \
also credited as Shaukat Azmi, was an Indian theater and film actress. Shaukat and Kaifi's daughter, Shabana Azmi \
(born 18 September 1950) is an Indian actress of Hindi film, television and theatre."
print(text)
Kaifi Azmi (born Athar Husain Rizvi; 14 January 1919 – 10 May 2002) was an Indian Urdu poet. He is remembered as the one who brought Urdu literature to Indian motion pictures. Shaukat Kaifi was born into a Shia Muslim family of Uttar Pradesh migrants in Hyderabad State. She grew up in Aurangabad, India. At a young age, she fell in love and married the Urdu poet Kaifi Azmi.Shaukat Kaifi (21 October 1926 – 22 November 2019) also credited as Shaukat Azmi, was an Indian theater and film actress. Shaukat and Kaifi's daughter, Shabana Azmi (born 18 September 1950) is an Indian actress of Hindi film, television and theatre.
In [4]:
#pattern = "\d{2}"
#pattern = "\d{2}\s"
#pattern = "\d{2}\s[A-Za-z]*"
#pattern = "\d{2}\s[A-Za-z]*\s"
#pattern = "\d{2}\s[A-Za-z]*\s[0-9]{4}"
pattern = "\d{2}\s[A-Za-z]*\s\d{4}"
In [5]:
matches = re.findall(pattern, text)

for match in matches:
    print(match)
    pass
14 January 1919
10 May 2002
21 October 1926
22 November 2019
18 September 1950

Capture phone from the text

In [5]:
text = "today is Jan 28, 2022 and tomorrow call me at 234 567-8763 or 234-578-8763"
print(text)
today is Jan 28, 2022 and tomorrow call me at 234 567-8763 or 234-578-8763
In [6]:
#pattern = "\d{3}"
#pattern = "\d{3}[-]"
#pattern = "\d{3}[-\s]"
#pattern = "\d{3}[-\s]\d{3}"
#pattern = "\d{3}[-\s]\d{3}-\d{4}"

#another pattern
pattern = "[0-9\s\-]{10,13}"
In [7]:
matches = re.findall(pattern, text)

for match in matches:
    print(match)
    pass
 234 567-8763
 234-578-8763

Capture phone numbers of your friends

In [2]:
text = "Tomorrow we are going to watch movie and after that dinner. Rohan you will inform \
John, kartik and Manisha. Their numbers are +91 8124564397, +91 (755) 322 6754 and +1 (812)-654-6754. \
Please do't forget to call them. "
print(text)
Tomorrow we are going to watch movie and after that dinner. Rohan you will inform John, kartik and Manisha. Their numbers are +91 8124564397, +91 (755) 322 6754 and +1 (812)-654-6754. Please do't forget to call them. 
In [13]:
#pattern = "\+\d{2}\s\d{10}"
#pattern = "\+\d{1,3}\s[\(]\d{3}[\)][\s-]\d{3}[\s-]\d{4}"
pattern = "\+\d{2}\s\d{10}|\+\d{1,3}\s[\(]\d{3}[\)][\s-]\d{3}[\s-]\d{4}"
In [14]:
matches = re.findall(pattern, text)

for match in matches:
    print(match)
    pass
+91 8124564397
+91 (755) 322 6754
+1 (812)-654-6754

Capture valid dates

In [21]:
text =  "05/3/2017  3/01/2017 1/6/17  34/11/937 may 21, 2017 21st mar 2017"
print(text)
05/3/2017  3/01/2017 1/6/17  34/11/937 may 21, 2017 21st mar 2017
In [12]:
patterndays = "(0?[1-9])"
patterndays = "((0?[1-9])|([12][0-9])|(3[01]))"
patternmonth = "((0?[1-9])|(1[0-2]))"
patternyear = "((19[0-9]{2})|(20[0-9]{2}))"
patternsep = "/"
In [13]:
pattern = patterndays + patternsep + patternmonth + patternsep + patternyear
print(pattern)
((0?[1-9])|([12][0-9])|(3[01]))/((0?[1-9])|(1[0-2]))/((19[0-9]{2})|(20[0-9]{2}))
In [14]:
n = re.finditer(pattern, text)
In [15]:
for item in n:
    print(item)
<re.Match object; span=(0, 9), match='05/3/2017'>
<re.Match object; span=(11, 20), match='3/01/2017'>

Match emails

\w matches - [a-zA-Z0-9_]
\d matches - [0-9]
. matches - any character except a newline
In [16]:
candidates = [
    'info@xcelvations.com',
    'nutan.xcelvations@gmail.com',
    'training@mail.xcelvations.com',
    'training123@xcelvations.in',
    'not-valid@example.zoo',
    'nutan@yahoo.com',
    'john_mathew@yahoo.xyz'
]
In [17]:
pattern = '[\w\d.+-]+@([\w\d.]+\.)+(com|in)'
In [18]:
address = re.compile(pattern)
In [19]:
for candidate in candidates:
 
    match = address.search(candidate)
    print('{:<30}  {}'.format(
        candidate, 'Matches' if match else 'No match')
    )
info@xcelvations.com            Matches
nutan.xcelvations@gmail.com     Matches
training@mail.xcelvations.com   Matches
training123@xcelvations.in      Matches
not-valid@example.zoo           No match
nutan@yahoo.com                 Matches
john_mathew@yahoo.xyz           No match
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing