Extract A Noun Phrase For A Sentence In Natural Language Processing

Extract A Noun Phrase For A Sentence In Natural Language Processing

In this blog, we will extract Noun phrase for a sentenence using TextBlob, Spacy and NLKT libraries.

What is TextBlob?

TextBlob: Simplified Text Processing

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Installation

We can installing or upgrading from the pip

pip install -U textblob

Download corpus

python -m textblob.download_corpora

What is Spacy?

Processing raw text intelligently is difficult: most words are rare, and it's common for words that look completely different to mean almost the same thing. The same words in a different order can mean something completely different. Even splitting text into useful word-like units can be difficult in many languages. While it’s possible to solve some problems starting from only the raw characters, it’s usually better to use linguistic knowledge to add useful information. That’s exactly what spaCy is designed to do: you put in raw text, and get back a Doc object, that comes with a variety of annotations.

The download command will install the package via pip and place the package in your site-packages directory.

pip install -U spacy python -m spacy download en_core_web_sm

What is NLTK?

The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing.

Install NLTK

pip install --user -U nltk

If you are working first time, you have to download below packages.

nltk.download('brown')

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

Now we have installed the required library. Let us extract the noun from a sentence.

Extract Noun using TextBlog

In [1]:
from textblob import TextBlob

Write a sentence

In [2]:
text = TextBlob("Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks.")

Noun Phrase Extraction

Noun phrases are accessed through the noun_phrases property.

In [3]:
for np in text.noun_phrases:
    print(np)
machine
ml
building methods
leverage data
artificial intelligence
machine
learning algorithms
sample data
training data
machine
learning algorithms
wide variety
speech recognition
computer vision
conventional algorithms

We got all the nouns.

Noun extraction using spacy

In [4]:
import spacy

Initialize a Language object

In [5]:
nlp = spacy.load("en_core_web_sm")

Define text

In [6]:
text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."

Pass text in language object

In [7]:
doc = nlp(text)
doc
Out[7]:
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Get all the noun with noun_chunks() method

In [8]:
for np in doc.noun_chunks:
      print(np)
Machine learning
ML
a field
inquiry
understanding and building methods
that
methods
that
data
performance
some set
tasks
It
a part
artificial intelligence
Machine learning algorithms
a model
sample data
training data
order
predictions
decisions
Machine learning algorithms
a wide variety
applications
medicine
email filtering
speech recognition
computer vision
it
conventional algorithms
the needed tasks

We got all the noun. In this case we got more nouns.

Noun extraction using NLTK

In [9]:
import nltk
from nltk import word_tokenize, pos_tag
In [10]:
text = "Machine learning (ML) is a field of inquiry devoted to understanding and building methods that learn, \
that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial \
intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make \
predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide \
variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is \
difficult or unfeasible to develop conventional algorithms to perform the needed tasks."

Convert text into word tokenizer

In [11]:
tokens = word_tokenize(text)
tokens
Out[11]:
['Machine',
 'learning',
 '(',
 'ML',
 ')',
 'is',
 'a',
 'field',
 'of',
 'inquiry',
 'devoted',
 'to',
 'understanding',
 'and',
 'building',
 'methods',
 'that',
 'learn',
 ',',
 'that',
 'is',
 ',',
 'methods',
 'that',
 'leverage',
 'data',
 'to',
 'improve',
 'performance',
 'on',
 'some',
 'set',
 'of',
 'tasks',
 '.',
 'It',
 'is',
 'seen',
 'as',
 'a',
 'part',
 'of',
 'artificial',
 'intelligence',
 '.',
 'Machine',
 'learning',
 'algorithms',
 'build',
 'a',
 'model',
 'based',
 'on',
 'sample',
 'data',
 ',',
 'known',
 'as',
 'training',
 'data',
 ',',
 'in',
 'order',
 'to',
 'make',
 'predictions',
 'or',
 'decisions',
 'without',
 'being',
 'explicitly',
 'programmed',
 'to',
 'do',
 'so',
 '.',
 'Machine',
 'learning',
 'algorithms',
 'are',
 'used',
 'in',
 'a',
 'wide',
 'variety',
 'of',
 'applications',
 ',',
 'such',
 'as',
 'in',
 'medicine',
 ',',
 'email',
 'filtering',
 ',',
 'speech',
 'recognition',
 ',',
 'and',
 'computer',
 'vision',
 ',',
 'where',
 'it',
 'is',
 'difficult',
 'or',
 'unfeasible',
 'to',
 'develop',
 'conventional',
 'algorithms',
 'to',
 'perform',
 'the',
 'needed',
 'tasks',
 '.']

Get all parts of speech

In [12]:
parts_of_speech = nltk.pos_tag(tokens)
parts_of_speech
Out[12]:
[('Machine', 'NN'),
 ('learning', 'NN'),
 ('(', '('),
 ('ML', 'NNP'),
 (')', ')'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('field', 'NN'),
 ('of', 'IN'),
 ('inquiry', 'NN'),
 ('devoted', 'VBN'),
 ('to', 'TO'),
 ('understanding', 'JJ'),
 ('and', 'CC'),
 ('building', 'NN'),
 ('methods', 'NNS'),
 ('that', 'WDT'),
 ('learn', 'VBP'),
 (',', ','),
 ('that', 'DT'),
 ('is', 'VBZ'),
 (',', ','),
 ('methods', 'NNS'),
 ('that', 'IN'),
 ('leverage', 'NN'),
 ('data', 'NNS'),
 ('to', 'TO'),
 ('improve', 'VB'),
 ('performance', 'NN'),
 ('on', 'IN'),
 ('some', 'DT'),
 ('set', 'NN'),
 ('of', 'IN'),
 ('tasks', 'NNS'),
 ('.', '.'),
 ('It', 'PRP'),
 ('is', 'VBZ'),
 ('seen', 'VBN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('part', 'NN'),
 ('of', 'IN'),
 ('artificial', 'JJ'),
 ('intelligence', 'NN'),
 ('.', '.'),
 ('Machine', 'NNP'),
 ('learning', 'VBG'),
 ('algorithms', 'JJ'),
 ('build', 'VB'),
 ('a', 'DT'),
 ('model', 'NN'),
 ('based', 'VBN'),
 ('on', 'IN'),
 ('sample', 'NN'),
 ('data', 'NNS'),
 (',', ','),
 ('known', 'VBN'),
 ('as', 'IN'),
 ('training', 'NN'),
 ('data', 'NNS'),
 (',', ','),
 ('in', 'IN'),
 ('order', 'NN'),
 ('to', 'TO'),
 ('make', 'VB'),
 ('predictions', 'NNS'),
 ('or', 'CC'),
 ('decisions', 'NNS'),
 ('without', 'IN'),
 ('being', 'VBG'),
 ('explicitly', 'RB'),
 ('programmed', 'VBN'),
 ('to', 'TO'),
 ('do', 'VB'),
 ('so', 'RB'),
 ('.', '.'),
 ('Machine', 'NNP'),
 ('learning', 'VBG'),
 ('algorithms', 'NNS'),
 ('are', 'VBP'),
 ('used', 'VBN'),
 ('in', 'IN'),
 ('a', 'DT'),
 ('wide', 'JJ'),
 ('variety', 'NN'),
 ('of', 'IN'),
 ('applications', 'NNS'),
 (',', ','),
 ('such', 'JJ'),
 ('as', 'IN'),
 ('in', 'IN'),
 ('medicine', 'NN'),
 (',', ','),
 ('email', 'NN'),
 ('filtering', 'NN'),
 (',', ','),
 ('speech', 'NN'),
 ('recognition', 'NN'),
 (',', ','),
 ('and', 'CC'),
 ('computer', 'NN'),
 ('vision', 'NN'),
 (',', ','),
 ('where', 'WRB'),
 ('it', 'PRP'),
 ('is', 'VBZ'),
 ('difficult', 'JJ'),
 ('or', 'CC'),
 ('unfeasible', 'JJ'),
 ('to', 'TO'),
 ('develop', 'VB'),
 ('conventional', 'JJ'),
 ('algorithms', 'NNS'),
 ('to', 'TO'),
 ('perform', 'VB'),
 ('the', 'DT'),
 ('needed', 'JJ'),
 ('tasks', 'NNS'),
 ('.', '.')]

Filter all noun from parts of speech

In [13]:
nouns = list(filter(lambda x: x[1] == "NN", parts_of_speech))
nouns
Out[13]:
[('Machine', 'NN'),
 ('learning', 'NN'),
 ('field', 'NN'),
 ('inquiry', 'NN'),
 ('building', 'NN'),
 ('leverage', 'NN'),
 ('performance', 'NN'),
 ('set', 'NN'),
 ('part', 'NN'),
 ('intelligence', 'NN'),
 ('model', 'NN'),
 ('sample', 'NN'),
 ('training', 'NN'),
 ('order', 'NN'),
 ('variety', 'NN'),
 ('medicine', 'NN'),
 ('email', 'NN'),
 ('filtering', 'NN'),
 ('speech', 'NN'),
 ('recognition', 'NN'),
 ('computer', 'NN'),
 ('vision', 'NN')]

We got all the nouns.

In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing