Text Data Handling

Data Handling

Input Text

In [1]:
text = """

Once, there were two friends who were crossing the jungle.
After some time, they saw a bear coming towards them.
Then, one of the friends quickly climbed the nearby tree, and the other one did not know how to climb the tree.
So he lays down on the ground, holding his breath.
The bear reaches towards him and sniffs him in the ear.
After some time, the bear left the place, thinking the man was dead.
Now the other friend climbs down and asks his friend, What did bear say to him in his ear?
He replied, "
To be safe from the fake friends."

"""
In [2]:
text
Out[2]:
'\n\nOnce, there were two friends who were crossing the jungle.\nAfter some time, they saw a bear coming towards them.\nThen, one of the friends quickly climbed the nearby tree, and the other one did not know how to climb the tree.\nSo he lays down on the ground, holding his breath.\nThe bear reaches towards him and sniffs him in the ear.\nAfter some time, the bear left the place, thinking the man was dead.\nNow the other friend climbs down and asks his friend, What did bear say to him in his ear?\nHe replied, "\nTo be safe from the fake friends."\n\n'
In [3]:
print(text)

Once, there were two friends who were crossing the jungle.
After some time, they saw a bear coming towards them.
Then, one of the friends quickly climbed the nearby tree, and the other one did not know how to climb the tree.
So he lays down on the ground, holding his breath.
The bear reaches towards him and sniffs him in the ear.
After some time, the bear left the place, thinking the man was dead.
Now the other friend climbs down and asks his friend, What did bear say to him in his ear?
He replied, "
To be safe from the fake friends."


In [7]:
mod_text = text.replace(",", "").replace(".", "").replace("?", "").replace("\"", "")
print(mod_text)

Once there were two friends who were crossing the jungle
After some time they saw a bear coming towards them
Then one of the friends quickly climbed the nearby tree and the other one did not know how to climb the tree
So he lays down on the ground holding his breath
The bear reaches towards him and sniffs him in the ear
After some time the bear left the place thinking the man was dead
Now the other friend climbs down and asks his friend What did bear say to him in his ear
He replied 
To be safe from the fake friends


In [8]:
mod_text = text.replace(",", "").replace(".", "").replace("?", "").replace('"', "")
print(mod_text)

Once there were two friends who were crossing the jungle
After some time they saw a bear coming towards them
Then one of the friends quickly climbed the nearby tree and the other one did not know how to climb the tree
So he lays down on the ground holding his breath
The bear reaches towards him and sniffs him in the ear
After some time the bear left the place thinking the man was dead
Now the other friend climbs down and asks his friend What did bear say to him in his ear
He replied 
To be safe from the fake friends


In [9]:
mod_text
Out[9]:
'\n\nOnce there were two friends who were crossing the jungle\nAfter some time they saw a bear coming towards them\nThen one of the friends quickly climbed the nearby tree and the other one did not know how to climb the tree\nSo he lays down on the ground holding his breath\nThe bear reaches towards him and sniffs him in the ear\nAfter some time the bear left the place thinking the man was dead\nNow the other friend climbs down and asks his friend What did bear say to him in his ear\nHe replied \nTo be safe from the fake friends\n\n'
In [ ]:
 
In [11]:
mod_text = text.replace(",", "").replace(".", "").replace("?", "").replace('"', ""
            ).replace("\n", " ")
print(mod_text)
  Once there were two friends who were crossing the jungle After some time they saw a bear coming towards them Then one of the friends quickly climbed the nearby tree and the other one did not know how to climb the tree So he lays down on the ground holding his breath The bear reaches towards him and sniffs him in the ear After some time the bear left the place thinking the man was dead Now the other friend climbs down and asks his friend What did bear say to him in his ear He replied  To be safe from the fake friends  
In [12]:
mod_text
Out[12]:
'  Once there were two friends who were crossing the jungle After some time they saw a bear coming towards them Then one of the friends quickly climbed the nearby tree and the other one did not know how to climb the tree So he lays down on the ground holding his breath The bear reaches towards him and sniffs him in the ear After some time the bear left the place thinking the man was dead Now the other friend climbs down and asks his friend What did bear say to him in his ear He replied  To be safe from the fake friends  '
In [13]:
type(mod_text)
Out[13]:
str
In [14]:
words = mod_text.split(" ")
In [16]:
print(words)
['', '', 'Once', 'there', 'were', 'two', 'friends', 'who', 'were', 'crossing', 'the', 'jungle', 'After', 'some', 'time', 'they', 'saw', 'a', 'bear', 'coming', 'towards', 'them', 'Then', 'one', 'of', 'the', 'friends', 'quickly', 'climbed', 'the', 'nearby', 'tree', 'and', 'the', 'other', 'one', 'did', 'not', 'know', 'how', 'to', 'climb', 'the', 'tree', 'So', 'he', 'lays', 'down', 'on', 'the', 'ground', 'holding', 'his', 'breath', 'The', 'bear', 'reaches', 'towards', 'him', 'and', 'sniffs', 'him', 'in', 'the', 'ear', 'After', 'some', 'time', 'the', 'bear', 'left', 'the', 'place', 'thinking', 'the', 'man', 'was', 'dead', 'Now', 'the', 'other', 'friend', 'climbs', 'down', 'and', 'asks', 'his', 'friend', 'What', 'did', 'bear', 'say', 'to', 'him', 'in', 'his', 'ear', 'He', 'replied', '', 'To', 'be', 'safe', 'from', 'the', 'fake', 'friends', '', '']
In [17]:
[]
Out[17]:
[]
for word in words: print(word)for word in words: if len(word) == 0: continue #go back print(len(word))
In [30]:
[ word  for word in words  if len(word) == 0  ]
Out[30]:
['', '', '', '', '']
In [32]:
mod_words = [ word  for word in words  if len(word) != 0  ]
In [33]:
print(mod_words)
['Once', 'there', 'were', 'two', 'friends', 'who', 'were', 'crossing', 'the', 'jungle', 'After', 'some', 'time', 'they', 'saw', 'a', 'bear', 'coming', 'towards', 'them', 'Then', 'one', 'of', 'the', 'friends', 'quickly', 'climbed', 'the', 'nearby', 'tree', 'and', 'the', 'other', 'one', 'did', 'not', 'know', 'how', 'to', 'climb', 'the', 'tree', 'So', 'he', 'lays', 'down', 'on', 'the', 'ground', 'holding', 'his', 'breath', 'The', 'bear', 'reaches', 'towards', 'him', 'and', 'sniffs', 'him', 'in', 'the', 'ear', 'After', 'some', 'time', 'the', 'bear', 'left', 'the', 'place', 'thinking', 'the', 'man', 'was', 'dead', 'Now', 'the', 'other', 'friend', 'climbs', 'down', 'and', 'asks', 'his', 'friend', 'What', 'did', 'bear', 'say', 'to', 'him', 'in', 'his', 'ear', 'He', 'replied', 'To', 'be', 'safe', 'from', 'the', 'fake', 'friends']
[x for x in mod_words ][len(x) for x in mod_words ]
In [38]:
word_lengths = [len(x)  for x in mod_words ]
In [39]:
print(word_lengths)
[4, 5, 4, 3, 7, 3, 4, 8, 3, 6, 5, 4, 4, 4, 3, 1, 4, 6, 7, 4, 4, 3, 2, 3, 7, 7, 7, 3, 6, 4, 3, 3, 5, 3, 3, 3, 4, 3, 2, 5, 3, 4, 2, 2, 4, 4, 2, 3, 6, 7, 3, 6, 3, 4, 7, 7, 3, 3, 6, 3, 2, 3, 3, 5, 4, 4, 3, 4, 4, 3, 5, 8, 3, 3, 3, 4, 3, 3, 5, 6, 6, 4, 3, 4, 3, 6, 4, 3, 4, 3, 2, 3, 2, 3, 3, 2, 7, 2, 2, 4, 4, 3, 4, 7]

How many

In [41]:
n = len(word_lengths)
n
Out[41]:
104

What is sum of all of them

In [43]:
s = sum(word_lengths)
s
Out[43]:
417

Average

In [44]:
avg = s / n
avg
Out[44]:
4.009615384615385
In [47]:
import numpy as np
import statistics as st
In [46]:
np.mean(word_lengths)
Out[46]:
4.009615384615385
In [48]:
st.mean(word_lengths)
Out[48]:
4.009615384615385
In [ ]:
 

Median

Median is the middle element in the sorted list

sorted(word_lengths)
In [51]:
sorted_lengths = sorted(word_lengths)
print(sorted_lengths)
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8]

What is middle position?

In [52]:
n / 2
Out[52]:
52.0

As it is even, 52 and 53 are both in the middle

In [53]:
sorted_lengths[52]
Out[53]:
4
In [54]:
sorted_lengths[53]
Out[54]:
4
In [55]:
(sorted_lengths[52]   + sorted_lengths[53]) / 2
Out[55]:
4.0
In [56]:
median = (sorted_lengths[52]   + sorted_lengths[53]) / 2
median
Out[56]:
4.0
In [57]:
np.median(word_lengths)
Out[57]:
4.0
In [58]:
st.median(word_lengths)
Out[58]:
4.0
In [ ]:
 

What is mode?

In [59]:
st.mode(word_lengths)
Out[59]:
3
np.mode(word_lengths)

Mode is the item that appears highest number of times

In [61]:
word_lengths.count(0)
Out[61]:
0
In [62]:
word_lengths.count(1)
Out[62]:
1
In [63]:
word_lengths.count(2)
Out[63]:
11
In [64]:
word_lengths.count(3)
Out[64]:
37
In [65]:
word_lengths.count(4)
Out[65]:
27
In [66]:
word_lengths.count(5)
Out[66]:
7
In [67]:
min(word_lengths)
Out[67]:
1
In [68]:
max(word_lengths)
Out[68]:
8
In [72]:
for x in range(1, 9):
    #print(x)
    print(x,  word_lengths.count(x)    )
1 1
2 11
3 37
4 27
5 7
6 9
7 10
8 2

Mode is 3 as it appears 37 times, the most.

In [ ]:
 
In [73]:
import collections  as cl
In [74]:
cl.Counter(word_lengths)
Out[74]:
Counter({4: 27, 5: 7, 3: 37, 7: 10, 8: 2, 6: 9, 1: 1, 2: 11})

So we got the frequency of items.

plot frequency

In [75]:
counter = cl.Counter(word_lengths)
In [76]:
import matplotlib.pyplot as plt
In [81]:
tuples = tuple(   counter.items()     )
tuples
Out[81]:
((4, 27), (5, 7), (3, 37), (7, 10), (8, 2), (6, 9), (1, 1), (2, 11))
In [83]:
plt.plot(tuples)
Out[83]:
[<matplotlib.lines.Line2D at 0x7fe5737b68b0>,
 <matplotlib.lines.Line2D at 0x7fe5737b6910>]
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing