A new and improved english sentiment analysis engine: more accurate, faster, slimmer.

Never content with where we are, we’ve been working hard at improving all aspects of our software offerings. Our original service, sentiment analysis, has long been due for an upgrade and today I’m pleased to announce a new version of our engine. It’s faster, much more accurate, has a much smaller code footprint and easier to reason about for us developers.

If you’re using our API, you don’t have to change a thing. The API itself remains unchanged, but hopefully you should see more accurate results now.

Just to show how much sleeker the new engine is, I ran a tweet through the old sentiment engine and the new one and used Python’s cProfile module to examine the number of function calls the two versions used.

Below is the summary that the profile module spit out for both versions (I’ve edited out some things for the sake of brevity & secrecy 🙂 )

The old sentiment analysis engine:

10067 function calls (9866 primitive calls) in 0.017 CPU seconds
Ordered by: call count
ncalls tottime percall cumtime percall filename:lineno(function)
 2666/2632 0.000 0.000 0.000 0.000 {len}
 943 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
 932 0.002 0.000 0.002 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:188(__next)
 832 0.001 0.000 0.002 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:207(get)
 490 0.001 0.000 0.001 0.000 {built-in method match}
 482 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
 354 0.000 0.000 0.000 0.000 {isinstance}
 285 0.000 0.000 0.001 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:136(__getitem__)
 250 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:201(match)
 204 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/stem/porter.py:258(ends)
 200 0.000 0.000 0.001 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/tag/brill.py:112(apply)
 168 0.000 0.000 0.000 0.000 {min}
 131 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:132(__len__)
 107 0.000 0.000 0.000 0.000 {ord}
 104 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:144(append)
 82 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:96(__init__)
 82/11 0.001 0.000 0.002 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_compile.py:38(_compile)
 81 0.000 0.000 0.000 0.000 {method 'has_key' of 'dict' objects}
 75 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_compile.py:24(_identityfunction)
 70/35 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/sre_parse.py:146(getwidth)

 

Here’s the output for the new sentiment analysis engine:


2941 function calls (2937 primitive calls) in 0.010 CPU seconds
Ordered by: call count
ncalls tottime percall cumtime percall filename:lineno(function)
 490 0.001 0.000 0.001 0.000 {built-in method match}
 384 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
 336 0.000 0.000 0.000 0.000 {len}
 204 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/stem/porter.py:258(ends)
 200 0.000 0.000 0.001 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/tag/brill.py:112(apply)
 81 0.000 0.000 0.000 0.000 {method 'has_key' of 'dict' objects}
 58 0.000 0.000 0.000 0.000 {method 'lower' of 'unicode' objects}
 58 0.000 0.000 0.000 0.000 {range}
 54 0.000 0.000 0.000 0.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py:64(_note)
 54 0.000 0.000 0.000 0.000 {thread.get_ident}
 54 0.000 0.000 0.000 0.000 {max}
 49 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/tag/brill.py:351(extract_property)
 48 0.000 0.000 0.000 0.000 {min}
 47 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/tag/brill.py:246(applies)
 39 0.000 0.000 0.000 0.000 {isinstance}
 36/32 0.000 0.000 0.000 0.000 /Repustate/dev/PYTHON_ENV/django-1.2.5/lib/python2.6/site-packages/nltk/stem/porter.py:172(cons)
 27 0.000 0.000 0.000 0.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py:149(__exit__)
 27 0.000 0.000 0.000 0.000 /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py:116(acquire)

 

 

Big difference, huh? 5X fewer function calls now, which is a massive saving when you multiply this across the millions of pieces of data that pass through our API each month.