Skip to content

MongoDB and pandas

stanislawbartkowski edited this page May 31, 2021 · 2 revisions

This article contains tips on how to access MongoDB using Python3

Links

H2O

https://github.com/h2oai/driverlessai-recipes/blob/master/data/databases/create_dataset_from_mongodb_collection.py

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html

PyMongo

https://pymongo.readthedocs.io/en/stable/tutorial.html

Prerequisites

Local Python3

sudo pip3 install pandas
sudo pip3 install matplotlib
sudo pip3 install pymongo
sudo pip3 install datatable

sudo pip3 install -f https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html h2o

Jupyter

The prerequisites required depends on the Jupyter image used. If necessary, load them using !conda command

import sys
!conda install --yes --prefix {sys.prefix} pymongo

Load MongoDB collection

from pymongo import MongoClient
import pandas as pd

MONGO_CONNECTION_STRING = "mongodb://admin:secret@adown-inf:27017/querydb?authSource=testdb"
MONGO_DB = "testdb"
MONGO_COLLECTION = "orders"

client= MongoClient(MONGO_CONNECTION_STRING)
db = client.get_database(MONGO_DB)
coll = db.get_collection(MONGO_COLLECTION)
#docs = coll.find()
docs = coll.find_one()

df = pd.DataFrame.from_dict(docs)
print(df.dtypes)
print(df.size)

#for doc in df.items(): 
#    print(doc)

for row in df.iterrows() : print(row)
Clone this wiki locally