Working With MongoDB & Python Using PyMongo and Pandas

In this blog, we will connect MongoDB with Python with help of PyMongo. Then we will create a database, collection and documents. After that we will documents in Pandas's dataframe.


  1. Install Python and pip
  2. Install Jupyter notebook
  3. Install MongoDB community server

What is MongoDB?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL).

What is PyMongo?

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. The pymongo package is a native Python driver for MongoDB.

Install PyMongo

PyMongo can be installed with pip:

pip install pymongo

conda install

conda install -c anaconda pymongo

Connect to MongoDB

In [16]:
import pymongo
In [17]:
from pymongo import MongoClient

class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

Client for a MongoDB instance, a replica set, or a set of mongoses. Means tools for connecting to MongoDB.
In [18]:
client = MongoClient()
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

MongoClient with optional parameters.

In [19]:
client = MongoClient("mongodb://localhost:27017")
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Show databases

In [20]:
['admin', 'config', 'local']

Create a new database

I am creating a new database called "sampledb". You can give database name according to your choice.

In [21]:
db = client.sampledb
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb')

View list of database again after creating a new database

In [22]:
['admin', 'config', 'local']

If the database doesn’t exist, then MongoDB creates it for you, but only when we perform the first operation on the database.

Create a collection

In [23]:
student_collection = db["students"]
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sampledb'), 'students')

Insert document in the collection

MongoDB generates the ObjectId dynamically, so no need to add id.

In [24]:
student1 = { "name": "John", "age": 10, "class": "VI", "section": "A" }
In [25]:
result = student_collection.insert_one(student1)
<pymongo.results.InsertOneResult at 0x2223a301700>
In [26]:
print(f"Inserted new record id: {result.inserted_id}")
Inserted new record id: 623df5c9a2f230bd4ff5fe28

Get the inserted record/document using find()

find(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None, allow_disk_use=None)

Query the database.

The filter argument is a prototype document that all results must match.
In [27]:
for student in student_collection.find():
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}

Insert multiple document

insert_many(documents, ordered=True, bypass_document_validation=False, session=None):

Insert an iterable of documents.

Create the list of students.

In [28]:
students = [
    { "name": "Maria", "age": 9, "class": "VI", "section": "B"},
    { "name": "Michel", "age": 11, "class": "VII", "section": "A"},
    { "name": "Priyanka", "age": 8, "class": "IV", "section": "B"},
    { "name": "Jeena", "age": 12, "class": "X", "section": "A" }
In [29]:
result = student_collection.insert_many(students)
<pymongo.results.InsertManyResult at 0x2223a301b80>

View the all inserted records

In [30]:
for student in student_collection.find():
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'), 'name': 'John', 'age': 10, 'class': 'VI', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe29'), 'name': 'Maria', 'age': 9, 'class': 'VI', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2a'), 'name': 'Michel', 'age': 11, 'class': 'VII', 'section': 'A'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2b'), 'name': 'Priyanka', 'age': 8, 'class': 'IV', 'section': 'B'}
{'_id': ObjectId('623df5cca2f230bd4ff5fe2c'), 'name': 'Jeena', 'age': 12, 'class': 'X', 'section': 'A'}

Find the first document in the student collection:

find_one(filter=None, *args, **kwargs)

Get a single document from the database.
In [31]:
student = student_collection.find_one()
{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
 'name': 'John',
 'age': 10,
 'class': 'VI',
 'section': 'A'}

Convert collection into Pandas dataframe

In [32]:
import pandas as pd
In [33]:
students = student_collection.find()
<pymongo.cursor.Cursor at 0x2223a2fd040>

Convert students Cursor to list

In [34]:
list_students = list(students)
[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2a'),
  'name': 'Michel',
  'age': 11,
  'class': 'VII',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2b'),
  'name': 'Priyanka',
  'age': 8,
  'class': 'IV',
  'section': 'B'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe2c'),
  'name': 'Jeena',
  'age': 12,
  'class': 'X',
  'section': 'A'}]

Convert list to the Pandas Dataframe

In [35]:
df = pd.DataFrame(list_students)
_id name age class section
0 623df5c9a2f230bd4ff5fe28 John 10 VI A
1 623df5cca2f230bd4ff5fe29 Maria 9 VI B
2 623df5cca2f230bd4ff5fe2a Michel 11 VII A
3 623df5cca2f230bd4ff5fe2b Priyanka 8 IV B
4 623df5cca2f230bd4ff5fe2c Jeena 12 X A

Convert collection with query into Pandas dataframe

In [36]:
students1 = student_collection.find({ "class": "VI" })
<pymongo.cursor.Cursor at 0x2223c6625e0>
In [37]:
list_students1 = list(students1)
[{'_id': ObjectId('623df5c9a2f230bd4ff5fe28'),
  'name': 'John',
  'age': 10,
  'class': 'VI',
  'section': 'A'},
 {'_id': ObjectId('623df5cca2f230bd4ff5fe29'),
  'name': 'Maria',
  'age': 9,
  'class': 'VI',
  'section': 'B'}]
In [39]:
df1 = pd.DataFrame(list_students1)
_id name age class section
0 623df5c9a2f230bd4ff5fe28 John 10 VI A
1 623df5cca2f230bd4ff5fe29 Maria 9 VI B
In [ ]:

