Hi, I'm Harlin and welcome to my blog. I write about Python, Alfresco and other cheesy comestibles.

Python - Coding for MongoDB

MongoDB

MongoDB is an open source database that uses a document-based model. A RDBMS system like MySQL, Postgresql or Oracle uses an organized system of tables that have a defined set of columns and functions that relate to each. Instead of tables and rows, MongoDB uses the concept of collections (tables) and documents (rows of data).

Using MongoDB or most other NoSQL databases eliminates the often-complex object-relational mapping (ORM) layer that relates objects to relational tables. MongoDB's flexible data model allows for schemas that can change as business requirements change.

What I want to do with this post is to give you an idea of how to install MongoDB, add and manipulate data both from the MongoDB client and from Python. You will see that both are very simple to use and are not very different from each other.

Install MongoDB

First, let's install MongoDB. The instructions for installation assume you are using Linux. But, if you're on Windows, you can follow these instructions here

On Linux ...

$ curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.6.2.tgz
$ tar -zxvf mongodb-linux-x86_64-3.6.2.tgz
$ rm mongodb-linux-x86_64-3.6.2.tgz 
$ mv mongodb-linux-x86_64-3.6.2/ mongodb

So that mongod and mongo can be usable from the command line, add the following to your .bashrc file or if you're on a RHEL-based system to .bash_profile:

export PATH=<mongodb-install-directory>/bin:$PATH

To make this modification effective, do the following:

$ source ~/.bashrc

or

$ source ~/.bash_profile

Start Mongodb

Here's how to start the MongoDB service.

First, create a data structure so that your database info can be stored:

$ mkdir -p data/db

Now, let's start MongoDB:

$ mongod --dbpath data/db

Verify that mongodb started successfully:

...
2018-01-28T09:55:10.898-0700 I NETWORK  [initandlisten] waiting for connections on port 27017

Use Mongodb

We'll need to use the mongo client so that we can interface with MongoDB.

$ mongo --host 127.0.0.1:27017

The first time you start it, you may see this error:

 [main] Error loading history file: FileOpenFailed: Unable to fopen() file /home/hseritt/.dbshell: No such file or directory

But it can be ignored. It just means that there has not been a history file created so far. Next time, you start it, you shouldn't see the error.

Database

A database in MongoDB is very much the same concept as a database in RDBMS systems.

When you start the client you will see a prompt like: '>'

This command shows you which database you're using:

> db
test

We're going to create a database of pets. So, let's switch to a database called pets. Note that when you do so, it creates a database called 'pets' automatically:

> use pets
switched to db pets
> db
pets

Collection

A collection is similar to an RDBMS table. A collection is simply a "collection" of similar documents. Document is similar to a row of data in an RDBMS system.

Create

Let's create a document related to dogs. Note that when we use db.dogs... the "dogs" collection is automatically created too.

Here we're creating a dog document that has a name and a breed. The next command below called db.dogs.find() will give a result of all dogs created so far:

> db.dogs.insertOne({'name': 'Spike', 'breed': 'English Bulldog'})
{
    "acknowledged" : true,
    "insertedId" : ObjectId("5a6e0117d53c52ede0133ad3")
}
> db.dogs.find()
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }

If we need to create many dog documents, we can use db.dogs.insertMany():

> db.dogs.insertMany(
... [
... {'name': 'Sparky', 'breed': 'Beagle'},
... {'name': 'Rusty', 'breed': 'Chihuahua'},
... ]
... )
{
    "acknowledged" : true,
    "insertedIds" : [
        ObjectId("5a6e0146d53c52ede0133ad4"),
        ObjectId("5a6e0146d53c52ede0133ad5")
    ]
}
> db.dogs.find()
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }
{ "_id" : ObjectId("5a6e0146d53c52ede0133ad4"), "name" : "Sparky", "breed" : "Beagle" }
{ "_id" : ObjectId("5a6e0146d53c52ede0133ad5"), "name" : "Rusty", "breed" : "Chihuahua" }

Now, let's create some cat documents:

> db.cats.insertMany(
... [
... {'name': 'Koko', 'breed': 'American Domestic Shorthair'},
... {'name': 'Gracie', 'breed': 'American Domestic Shorthair'},
... {'name': 'Cheshire', 'breed': 'Snow Leopard'},
... ]
... )
{
    "acknowledged" : true,
    "insertedIds" : [
        ObjectId("5a6e016fd53c52ede0133ad6"),
        ObjectId("5a6e016fd53c52ede0133ad7"),
        ObjectId("5a6e016fd53c52ede0133ad8")
    ]
}
> db.cats.find()
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad6"), "name" : "Koko", "breed" : "American Domestic Shorthair" }
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad7"), "name" : "Gracie", "breed" : "American Domestic Shorthair" }
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad8"), "name" : "Cheshire", "breed" : "Snow Leopard" }

Query

We can use db.getCollection('dogs')... or db.dogs... to start off our query. But, we'll keep it mostly simply and use db.dogs:

> db.getCollection('dogs').find({'name': 'Spike'})
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }

> db.dogs.find({'name': 'Spike'})
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }

Let's query our cats now:

> db.cats.find({'breed': 'American Domestic Shorthair'})
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad6"), "name" : "Koko", "breed" : "American Domestic Shorthair" }
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad7"), "name" : "Gracie", "breed" : "American Domestic Shorthair" }

After a while, your client screen will get a little crowded. If your screen starts to get overwhelming, you can clear it by running:

> cls

Update

We can also modify our documents with updateOne(). For example, when our family got the cat named "Cheshire", my daughter said it was a snow leopard. You don't find many of those in Georgia and we knew it wasn't really a snow leopard. We found out though that it was an Egyptian Mau. So, we'll use the updateOne() function to change the breed type from Snow Leopard to Egyptian Mau.

> db.cats.find({'name': 'Cheshire'})
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad8"), "name" : "Cheshire", "breed" : "Snow Leopard" }

> db.cats.updateOne(
... {'name': 'Cheshire'},
... {
... $set: {'breed': 'Egyptian Mau'}
... }
... )
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

> db.cats.find({'name': 'Cheshire'})
{ "_id" : ObjectId("5a6e016fd53c52ede0133ad8"), "name" : "Cheshire", "breed" : "Egyptian Mau" }

Delete

If we need to, we can delete a document.

> db.dogs.find()
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }
{ "_id" : ObjectId("5a6e0146d53c52ede0133ad4"), "name" : "Sparky", "breed" : "Beagle" }
{ "_id" : ObjectId("5a6e0146d53c52ede0133ad5"), "name" : "Rusty", "breed" : "Chihuahua" }

To me, Chihuahua are a noisy and nervous breed. Luckily, we found someone who likes them and we gave them our dog named 'Rusty'.

> db.dogs.deleteOne({'name': 'Rusty'})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.dogs.find()
{ "_id" : ObjectId("5a6e0117d53c52ede0133ad3"), "name" : "Spike", "breed" : "English Bulldog" }
{ "_id" : ObjectId("5a6e0146d53c52ede0133ad4"), "name" : "Sparky", "breed" : "Beagle" }

Note that 'Rusty' is no longer with us. :-) So, before I show you some Python code that will do the same things we just did with the Mongo client, we'll go ahead and clear the dogs and cats collection but keep them available.

> db.dogs.deleteMany({})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.cats.deleteMany({})
{ "acknowledged" : true, "deletedCount" : 3 }
> db.dogs.find()
> db.cats.find()
> exit

PyMongo

I am using Python 3.6.4 but I believe you can use the following code with any Python 3.x versions.

$ python
Python 3.6.4 (default, Jan  7 2018, 10:19:13) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

For our MongoDB sandbox, let's use pyenv to create a virtual environment. This is optional but it can help you keep your environments from conflicting with each other.

$ pyenv virtualenv mongodb-sandbox
$ pyenv local mongodb-sandbox

Let's install the pymongo module that allows our Python code to "talk" to MongoDB.

$ pip install pymongo

Connection and Close

Create a script called demo.py and put the following code in it.

#!/usr/bin/env python

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.pets

print(db.name)

client.close()

Let's run our simple connection demo. You should see something like this.

$ chmod +x demo.py
$ ./demo.py
pets

Create and Query

Like we used the MongoDB client to create documents and query them, we can do the same using Python and pymongo.

#!/usr/bin/env python

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.pets

db.dogs.insert_one({'name': 'Spike', 'breed': 'English Bulldog'})
db.dogs.insert_many(
    [
        {'name': 'Sparky', 'breed': 'Beagle'},
        {'name': 'Rusty', 'breed': 'Chihuahua'},
    ]
)

db.cats.insert_many(
    [
        {'name': 'Koko', 'breed': 'American Domestic Shorthair'},
        {'name': 'Gracie', 'breed': 'American Domestic Shorthair'},
        {'name': 'Cheshire', 'breed': 'Snow Leopard'},
    ]
)

print('Dogs: ')

for document in db.dogs.find({}):
    print(document['name'])
    print(document['breed'])
    print('\n')

print('Cats: ')
for document in db.cats.find({}):
    print(document['name'])
    print(document['breed'])
    print('\n')

client.close()

When we run it we should see very similar output:

$ ./demo.py
Dogs: 
Spike
English Bulldog


Sparky
Beagle


Rusty
Chihuahua


Cats: 
Koko
American Domestic Shorthair


Gracie
American Domestic Shorthair


Cheshire
Snow Leopard

Update

As we updated our cats breed using the MongoDB client, we can do the same thing with our Python code.

#!/usr/bin/env python

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.pets

for document in db.cats.find({'name': 'Cheshire'}):
    print(document)

db.cats.update_one(
    {'name': 'Cheshire'},
    {'$set': {'breed': 'Egyptian Mau'}}
)

for document in db.cats.find({'name': 'Cheshire'}):
    print(document)

client.close()

The output should look like:

$ ./demo.py 
{'_id': ObjectId('5a6e04aa68f1881c2524c64d'), 'name': 'Cheshire', 'breed': 'Snow Leopard'}
{'_id': ObjectId('5a6e04aa68f1881c2524c64d'), 'name': 'Cheshire', 'breed': 'Egyptian Mau'}

Delete

Of course, we can also delete all documents from our collections:

#!/usr/bin/env python

from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.pets

for document in db.dogs.find({}):
    print(document['name'])
    print(document['breed'])

db.dogs.delete_one({'name': 'Rusty'})

for document in db.dogs.find({}):
    print(document['name'])
    print(document['breed'])

db.dogs.delete_many({})

print([document for document in db.dogs.find({})])

client.close()

Our output:

$ ./demo.py 
Spike
English Bulldog
Sparky
Beagle
Rusty
Chihuahua
Spike
English Bulldog
Sparky
Beagle

# Our query result is an empty set.

[]

MongoDB is very simple to use and very simple to code with. Its document-oriented data models allow our code to be very flexible without needing to make what are often-times painful schema changes. This is great for unstructured, dynamic and frequent chaotic uses. But, I would still stick to an RDBMS if your data has definite structure to it and where the schemas don't change often. Hope this was helpful for you.

Any Comments, Always Welcome!