Why Our Customers Love Us: Data Science, Ngrams Analysis and NPS

Why Do Our Customers Love Us?

Or more specifically -- of our customers that love us enough to recommend Jungle Disk to a friend or colleague, what are they saying?

I would like to walk you through a method using python, the requests, pandas, and scikit-learn packages to help answer this question.

First off, why answer this question? What are we hoping to influence? For whom? I like to think of NPS (or Net Promoter Score) analysis as a product management and service delivery tool. If we know what we are doing well, it is easier to double-down on it. e.g.:

  • If your Promoters love how easy your software is to use, double-down on user experience (UX).
  • If they call-out a specific feature, double-down on making it perfect.
  • If they feel that they are getting a good value from your product, double-down on materials and copy that inculcates this.
  • If they love how helpful and knowledgeable your support staff is, double-down on their training and tools.

This perspective was greatly influenced by the book Switch by Chip and Dan Heath.

Getting and Making Sense of Our Data

At Jungle Disk, we use the service Promoter.io to survey our customers for NPS data. It offers a wonderful API that allows us to retrieve our survey results programmatically. The following python code will use this API to retrieve all customer comments - along with each’s NPS score - and store them as a pandas DataFrame:

import numpy as np, pandas as pd
import requests

_headers = {'Authorization': 'Token <SECRET_API_TOKEN_GOES_HERE>', 'Content-Type': 'application/json'}

df = pd.DataFrame ()
p = requests.get ('https://app.promoter.io/api/feedback/', headers = _headers, data = {})
while (True):
print (df.shape)
df = df.append (pd.DataFrame (p.json ()['results']))
x = p.json ()['next']
if x is not None:
p = requests.get (x, headers = _h, data = d)

df = df.reset_index (drop = True)

Use pandas to extract applicable features

a = pd.concat ([
df['contact'].apply (pd.Series).attributes.apply (pd.Series).cus,
df[['comment', 'posted_date', 'comment_updated_date', 'score', 'score_type']]
], axis = 1, join = 'inner')

a = a[a.cus.notnull ()]

a.columns = ['cus', 'jot', 'datp', 'datx', 'score', 'typ']

a['cus'] = a.cus.astype ('int64')
a['datp'] = a.datp.astype ('datetime64[s]')
a['datx'] = a.datx.astype ('datetime64[s]')
a['typ'] = pd.Categorical (a.typ)
a['category'] = a.typ.cat.codes

For distinct customers

d = a.sort_values (['datp']).groupby (['cus']).last ()

Let’s take a quick look at what this DataFrame d looks like for Jungle Disk Promoters:

>>> d[d.typ == 'promoter'].sample (10)
                                                         jot  \
100XXXXXX                                                NaN   
653XXXXXX  The representatives are always very helpful, v...   
886XXXXXX  Jungle Disk employees are amazing.  The few ti...   
129XXXXXX                                                NaN   
677XXXXXX                 Backup procedure with compression.   
118XXXXXX                      Our IT guy told us to get it.   
340XXXXXX  Ease of use, Ease of recoverability, reasonabl...   
111XXXXXX          Ease of use and quick to answer questions   
384XXXXXX             Very secure way to protect your files.   
721XXXXXX                                              Price   
                     datp                datx  score       typ  category  

100XXXXXX 2017-05-08 17:46:46 NaT 10 promoter 2
653XXXXXX 2018-02-15 03:12:27 2018-02-15 03:14:15 10 promoter 2
886XXXXXX 2017-12-13 20:04:06 2017-12-26 05:30:52 10 promoter 2
129XXXXXX 2018-01-17 00:47:25 NaT 10 promoter 2
677XXXXXX 2017-10-26 18:17:12 2017-12-26 05:29:47 10 promoter 2
118XXXXXX 2017-11-12 11:44:51 2017-12-26 05:30:44 9 promoter 2
340XXXXXX 2017-12-12 17:08:23 2017-12-26 05:30:51 10 promoter 2
111XXXXXX 2017-11-29 03:13:24 2017-12-26 05:30:48 9 promoter 2
384XXXXXX 2017-11-22 18:00:59 2017-12-26 05:29:56 10 promoter 2
721XXXXXX 2017-08-30 16:16:35 2017-12-26 05:30:26 10 promoter 2

Excellent, but how do we make sense of these comments? What are the themes and general ideas our customers are expressing?

Ngrams Analysis

For this analysis, an Ngram is simply a group of words with a set range of length and the stopwords - e.g. i, a, the, it, … etc.

For the sentence There must be some kind of way outta here said the joker to the thief., the sentence with stopwords removed is must kind way said joker theif. It’s 2 to 4-word ngrams are: (must, kind), (must, kind, way), (must, kind, way, said), (kind, way), (kind, way, said), (kind, way, said, joker), … etc.

In python, scikit-learn’s CountVectorizer class offers a convenient way to do this:

from sklearn.feature_extraction.text import CountVectorizer

cvz = CountVectorizer (
analyzer = 'word', stop_words = 'english', ngram_range = (2, 4)

transformed = cvz.fit_transform (d[(d.typ == 'promoter') & (d.jot.notnull ())].jot.tolist ())
e = pd.DataFrame (transformed.A, columns = cvz.get_feature_names ())

To quickly view the top 20 most frequently mentioned ngrams, key the following:

>>> pd.Series (e.sum ().sort_values (ascending = False).head (20).index.values)
0                   easy use
1                   ease use
2                jungle disk
3           customer service
4                 just works
5              great service
6              great support
7           reliable service
8                 peace mind
9             great customer
10            cost effective
11                set forget
12             great product
13    great customer service
14              tech support
15                easy setup
16             product works
17                  easy set
18              good service
19               great price
dtype: object

Given this view of our promoter’s feedback, what investments to the future of Jungle Disk would you advocate for as a product manager? A continued, simple UX? Continued training for our service delivery team? A faster and more reliable back-end? The ability to more quickly backup and restore larger datasets?

Reach out and let us know your thoughts.

Protect Your Business Data

We are passionate about helping our customers protect their data. We want you to use Jungle Disk to protect yours. Click on Sign Up to get started. It takes less than 5 minutes!

Sign Up