Hi, I'm Harlin and welcome to my blog. I write about Python, Alfresco and other cheesy comestibles.

Python + Alfresco: How to Get Document Information Using CMIS

Recently a customer asked for a way to query the Alfresco database for document information. Database queries are generally not supported either for transactions or just as read-only queries.

First off, the Alfresco database is very normalized and designed more for efficiency and performance but not so much for readability.

Secondly, many times we've given queries to use as one-off and found that too many times these same queries ended up being used in scripts and custom apps that query continuously and often cause performance issues and inconvenient table locks.

Personally, I recommend using something like either CMIS or JavaScript to get whatever you need. If it is something that's going to be one off or part of an Alfresco folder rule, it's usually best to use JavaScript. However, if you're going to make it part of a custom app, why not consider using CMIS? You can then use either Java or Python to interface with it.

Below is a script I use to get general info on documents.

Note: You can also check out the code at my Github page

#!/usr/bin/env python
"""
A script to get all documents' info in the repository.
Runs successfully with Python 2.7.12 and cmislib 0.5.1.
"""

from cmislib import CmisClient

cmis_url = 'http://alfrescodemo.com:8080/alfresco/cmisatom'
cmis_uid = 'admin'
cmis_pwd = 'admin'

client = CmisClient(
    cmis_url,
    cmis_uid,
    cmis_pwd
)

repo = client.defaultRepository

print(repo)

results = repo.query("SELECT * FROM cmis:document")

for row in results:
    # Uncomment below to see all property keys and values
    #print row.properties
    print

        # Will give the name of the document:
    print 'Name: {}'.format(row.properties['cmis:name'])

    # Shows noderef which can be used with any public Alfresco API
    print 'NodeRef: {}'.format(row.properties['alfcmis:nodeRef'])

    # Shows the user who created the document (creater is also owner)
    print 'Owner: {}'.format(row.properties['cmis:createdBy'])

    # Shows last modified date and last modified by and latest version of document.
    print 'Last Mod Date: {}'.format(row.properties['cmis:lastModificationDate'])
    print 'Last Mod By: {}'.format(row.properties['cmis:lastModifiedBy'])
    print 'Latest Version: {}'.format(row.properties['cmis:versionLabel'])

    # Get the document object so we can get the path and the available ACLs for it.
    doc = repo.getObject(row.properties['cmis:objectId'])

    try:
        print 'Paths: {}'.format(doc.getPaths()[0])
    except IndexError:
        print 'Paths: N/A'
    acl = doc.getACL()
    print 'ACLs: {}'.format(acl.getEntries())
    print

CMIS even with Python is very flexible and can be used to get a ton of information on repository objects. Java is still used quite a bit by our customers when it comes to writing code but I would also offer up using Groovy to do CMIS as well. Jeff Pott's book, CMIS and Apache Chemistry in Action uses Groovy in most of its examples.

A limitation though is that only Python 2.7.x is supported with CMIS. As of yet, Python cmislib is not compatible with Python 3.x yet.

Any Comments, Always Welcome!