As a data scientist, I spend a lot of time analyzing how our users interact with WordPress.com. However, WordPress.com isn’t the only place to gain insight into how people use and talk about our services. Many WordPress.org and WordPress.com discussions take place on social media. Analyzing these discussions can help us understand what our users are saying about WordPress[*] and Automattic, the topics closely associated with our services, and who is leading these discussions.
In every social network, there are people who steer the topic and sentiment of the conversation. These influencers usually have large followings and are positioned centrally within the network. Brands often reach out to influencers to organize focus groups or invite them to events, since they’re usually knowledgeable about the brand and can offer insight into how consumers use the product and potential improvements.
At Automattic, we don’t do traditional influencer marketing. However, since the discussions around WordPress in general can offer insight into our users’ experience with our products, I explored the topics, influencers, and ecosystem around WordPress and Automattic products on Twitter. These are the questions I set out to answer:
- Who is talking about WordPress and Automattic on Twitter?
- What are they saying?
- Who are the influencers within the online WP communities?
This post outlines the main findings of my mini-analysis and emphasizes the need to introduce context into social media analytics.
Tweets are public and can be searched and retrieved from Twitter in two ways: via the REST API and via the Streaming API. (Of course, none of the APIs can access non-public tweets.) I relied on the REST API for this analysis.
With the REST API, we can track two things: user timelines or search terms. User timelines give us all tweets posted and retweeted by a given user, whereas search term tracking gives us tweets containing specific terms. I used search term tracking over about 10 days to track 13 terms associated with WordPress.com, WordPress.org, and Automattic. The tracking resulted in ~118,000 tweets and ~10,000 unique hashtags.
Who, what, how?
Topics around WordPress
A very straightforward way to extract terms from tweets is to just use hashtags. However, if we look at a plot of the top hashtags within our dataset, we see nothing surprising or very enlightening:
This is what I call a volume-based approach, where the data analysis only looks at the frequency of things (such as how many retweets a user has received, or how many times a hashtag was used) and doesn’t introduce context. A more enlightening approach is to look at relationships formed by hashtags. To do so, I created a network where two hashtags are connected if they appeared in the same tweet together. The more frequently they appeared together, the stronger the connection between them. I also clustered this network, so that different topic communities become visible. Note that the image below depicts the breadth of the topic clusters. We zoom in below as we examine specific clusters.
Zooming in, we can see the top terms within this topic network as well as how they are connected to each other to distinguish between different contexts. The terms form clusters, or closely connected groups. Identifying these clusters helps us to distinguish between different topics of parallel discussions, providing us with a context to understand where certain terms fit within the discussions. For example, we might see that a negative term appears a lot, and might believe that the discussions around WordPress have a negative spin, but when we look at a term network, we could see that the negativity is specific to a single topic cluster.
We see four-five larger topics surrounding WordPress and Automattic. Zooming in to the center of the network, we see that the largest topic is about business and businesses built around the WordPress ecosystem, in purple:
Writing, in turquoise, is a topic that featured prominently in the internal WP.com network as well:
Then there is a more technical topic around competitors and ecommerce in red:
Finally, there is a net security topic cluster in yellow:
Looking at hashtags and contextualizing terms via networks gives us a lot more insight into the actual topics being discussed as opposed to just looking at top terms or any other approach purely based on volume. We can also prepare tag clouds to look at these topics separately:
Influencers in the conversation around WordPress
One way of looking at who is talking about WordPress is by plotting the top retweeted tweets and top retweeted tweeters, or maybe even those people who mentioned any of the tracked terms the most times.
The top retweeted tweeter is @imprincessjones. However, as we will see, the context and mapping the actual conversation are more important and provide better results than the volume-based approach.
Twitter offers us a very straightforward way to understand who is talking to whom. There are three ways that users can interact with each other: replies, mentions, and retweets. The following network depicts this conversation flow. By having users as nodes and interactions as edges, each interaction edge is colored differently to show if the user replied to, mentioned, or retweeted another user.
Nodes are sized based on their PageRank, as opposed to number of interactions received. The idea behind PageRank — in very superficial terms — is that if I receive 100 retweets but they all come from my uninfluential friends and family then my retweets are worth less than if I received 100 retweets from The New York Times, for example.
It is immediately obvious that a previous “influencer” such as @imprincessjones is shown to be on the periphery of the network and receives interactions only from a set of accounts. This shows that not only context, but also position in the network is important. This kind of analysis can show clusters of users that only interact with each other, acting as retweet farms.
They might have high metrics, but when they are put into the big picture, they lose their importance by being forced to the periphery of the network.
We have another large one around @photomatt, Matt Mullenweg (our CEO), as well as themes and ecommerce / woocommerce-led conversations.
There are also some smaller but tightly knit conversation groups, such as this one that seems to be between open source developers of WordPress.
The network lets us see how the conversations around WordPress are divided into different smaller communities interacting with each other, while giving us a list of influential Twitter users we could reach out to, to understand their experiences with our products, or for marketing and PR purposes.
This has been an overview of a small part of what kind of analyses can be done using social media data to help PR and Marketing efforts. Stay tuned for other similar case studies!
* WordPress.org, WordPress.com, Jetpack, Akismet, and VaultPress were included in the study. Here are the terms tracked as part of this analysis: #wordpress @wordpressdotcom automattic mullenweg @photomatt @wordpress @jetpack @akismet akismet @vaultpress vaultpress @woocommerce woocommerce