Hi, I'm Harlin and welcome to my blog. I write about Python, Alfresco and other cheesy comestibles.

Python - How to Use Counters to Make Counting Easier

One use case for dictionaries is to pull data from a file or other steam and make a count of occurrences of substrings.

The Counter object is available for quick and easy calculations based on a list of objects and their counts.

Python's Counter object is an unordered collection where items are stored as keys and their counts are stored as values in a dictionary-like object. Counter values can be of any integer type. You can use zero or negative numbers.

When initialized as Counter(word_list) it will count the number of invidual strings and assign the count of their occurences as the value with the word as the key. For example, we can break down a file containing the Gettysburg Address into a list of words:

>>> from collections import Counter

>>> my_file = 'gettysburg_address.txt'

>>> words = [ word.lower() for word in open(my_file, 'r').read().split() \
    if '-' not in word ]

>>> word_counter = Counter(words)

>>> for k, v in word_counter.items():
...     print(k, v)
... 
four 1
score 1
and 6
seven 1
years 1
ago 1
our 2
fathers 1
...

When I worked at IBM ELearning, I had to write a script based on an Apache log file that showed the actual file names and types requested (these were in mp3, avi and wav format) and then had to count the number of requests from each one and output that to a web page.

This is fairly easy to do with either Java or Python but would have been completely a cinch if the Counter object was at my disposal then. The most_common() method makes it even more of a cinch. This will effectively sort the structure with the most occurences showing at the top. For example, here is a top 10 listing of the most common words in the Gettysburg Address speech:

for each in word_counter.most_common()[:10]:
    k, v = each
    print('Word: {} | {}'.format(k, v))

Word: that | 13
Word: the | 11
Word: we | 10
Word: to | 8
Word: a | 7
Word: and | 6
Word: can | 5
Word: of | 5
Word: have | 5
Word: for | 5

The Counter object can handle non-existent values for keys as well. If you try to get a count of an element that is not in the object, there won't be an exception raised and the count will just print as zero.

c = Counter('abcd')
print('Count is {}'.format(c['e']))

Count is 0

We don't have to pull the data for the Counter from a file necessarily. If we need to, we can initialize the Counter with known-size elements:

c = Counter(a=1, b=2, c=3, d=4, e=5)

Getting raw access from within the Counter object is doable as well. If you need to print out all elements, we can call the elements() method which returns as an iterator:

elements = c.elements()
for element in elements:
    print(element)

a
b
b
c
c
c
d
d
d
d
e
e
e
e
e

It's also possible to do some simple math operations. We can use a subtract() method to do subtraction operations on a Counter as well. We just need to define another Counter object and we can do this:

d = Counter(a=0, b=1, c=2, d=3, e=4)
c.subtract(d)
print(c)

Keep in mind that there is no add() method available but you can use either:

sum(c, d)

# or 

c + d

To be honest though, I would probably use numpy for anything more complex concerning math operations.

The next time you ever need to do quick tallies on string occurences, consider using the Counter object. It will save you some time and will produce more Pythonic code.

Any Comments, Always Welcome!