Working With MongoDB & Python Using PyMongo and Pandas

Working With MongoDB & Python Using PyMongo and Pandas

In this blog, we will connect MongoDB with Python with help of PyMongo. Then we will create a database, collection and documents. After that we will documents in Pandas's dataframe.

Prerequisites

  1. Install Python and pip
  2. Install Jupyter notebook
  3. Install MongoDB community server

What is MongoDB?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL).

What is PyMongo?

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. The pymongo package is a native Python driver for MongoDB.

Install PyMongo

PyMongo can be installed with pip:

pip install pymongo

conda install

conda install -c anaconda pymongo

Connect to MongoDB

In [16]:
import pymongo
In [17]:
from pymongo import MongoClient

class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

Client for a MongoDB instance, a replica set, or a set of mongoses. Means tools for connecting to MongoDB.
In [18]:
client = MongoClient()
client
Out[18]:
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

MongoClient with optional parameters.

In [19]:
client = MongoClient("mongodb://localhost:27017")
client
Out[19]:
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Show databases

In [20]:
print(client.list_database_names())
['admin', 'config', 'local']

Create a new database

I am creating a new database called "sampledb". You can give database name according to your choice.

In [21]:
db = client.sampledb
db
Out[21]:
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb')

View list of database again after creating a new database

In [22]:
print(client.list_database_names())
['admin', 'config', 'local']

If the database doesn’t exist, then MongoDB creates it for you, but only when we perform the first operation on the database.

Create a collection

In [23]:
student_collection = db["students"]
student_collection
Out[23]:
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb'), 'students')

Insert document in the collection

MongoDB generates the ObjectId dynamically, so no need to add id.

In [24]:
student1 = { "name": "John", "age": 10, "class": "VI", "section": "A" }
In [25]:
result = student_collection.insert_one(student1)
result
Out[25]:
<pymongo.results.InsertOneResult at 0x2223a301700>
In [26]:
print(f"Inserted new record id: {result.inserted_id}")
Inserted new record id: 623df5c9a2f230bd4ff5fe28

Get the inserted record/document using find()

find(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None, allow_disk_use=None)

Query the database.

The filter argument is a prototype document that all results must match.
In [27]:
for student in student_collection.find():
    print(student) 
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}

Insert multiple document

insert_many(documents, ordered=True, bypass_document_validation=False, session=None):

Insert an iterable of documents.

Create the list of students.

In [28]:
students = [
    { "name": "Maria", "age": 9, "class": "VI", "section": "B"},
    { "name": "Michel", "age": 11, "class": "VII", "section": "A"},
    { "name": "Priyanka", "age": 8, "class": "IV", "section": "B"},
    { "name": "Jeena", "age": 12, "class": "X", "section": "A" }
]
In [29]:
result = student_collection.insert_many(students)
result
Out[29]:
<pymongo.results.InsertManyResult at 0x2223a301b80>

View the all inserted records

In [30]:
for student in student_collection.find():
    print(student) 
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe29'), 'name': 'Maria', 'age': 9, 'class': 'VI', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2a'), 'name': 'Michel', 'age': 11, 'class': 'VII', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2b'), 'name': 'Priyanka', 'age': 8, 'class': 'IV', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2c'), 'name': 'Jeena', 'age': 12, 'class': 'X', 'section': 'A'}

Find the first document in the student collection:

find_one(filter=None, *args, **kwargs)

Get a single document from the database.
In [31]:
student = student_collection.find_one()
student
Out[31]:
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
 'name': 'John',
 'age': 10,
 'class': 'VI',
 'section': 'A'}

Convert collection into Pandas dataframe

In [32]:
import pandas as pd
In [33]:
students = student_collection.find()
students
Out[33]:
<pymongo.cursor.Cursor at 0x2223a2fd040>

Convert students Cursor to list

In [34]:
list_students = list(students)
list_students
Out[34]:
[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2a'),
  'name': 'Michel',
  'age': 11,
  'class': 'VII',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2b'),
  'name': 'Priyanka',
  'age': 8,
  'class': 'IV',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2c'),
  'name': 'Jeena',
  'age': 12,
  'class': 'X',
  'section': 'A'}]

Convert list to the Pandas Dataframe

In [35]:
df = pd.DataFrame(list_students)
df
Out[35]:
_id name age class section
0 623df5c9a2f230bd4ff5fe28 John 10 VI A
1 623df5cca2f230bd4ff5fe29 Maria 9 VI B
2 623df5cca2f230bd4ff5fe2a Michel 11 VII A
3 623df5cca2f230bd4ff5fe2b Priyanka 8 IV B
4 623df5cca2f230bd4ff5fe2c Jeena 12 X A

Convert collection with query into Pandas dataframe

In [36]:
students1 = student_collection.find({ "class": "VI" })
students1
Out[36]:
<pymongo.cursor.Cursor at 0x2223c6625e0>
In [37]:
list_students1 = list(students1)
list_students1
Out[37]:
[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'}]
In [39]:
df1 = pd.DataFrame(list_students1)
df1
Out[39]:
_id name age class section
0 623df5c9a2f230bd4ff5fe28 John 10 VI A
1 623df5cca2f230bd4ff5fe29 Maria 9 VI B
In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing