Using Sociology(!) to Explain Unfollows on TwitterJanuary 19th, 2011 by naaman
Well, he still does, not least because he knows I will send roadkill to his office address if he stops. But surely, people stop following one another on Twitter all the time. Right? Right? Yes, right, as we show in our recent paper (caution, PDF), with my PhD students Funda Kivran-Swaine and Priya Govindan, to be published at CHI 2011.
Many studies, in academia and industry, in computer science and sociology (this one too), examine creation of new ties in social networks, but very few examine tie breaks and persistence. Why? One reason is that, in computer science, models of tie creation have immediate consequences for systems (e.g., recommending new contacts). Another reason is that tie breaks are rare, or hard to detect/define in many social networks, especially those networks studied by sociologists (when does Naaman’s tie with Ayman break? after 3 years on not communicating? 20?). Ron Burt‘s work is an exception, but Ron is always an exception, isn’t he.
Enter Twitter, where we can witness a dynamic social system, and where ties are created and broken for all to observe. Op-por-tu-ni-ty! Can we shed some light on the tie break phenomena in Twitter? How wide-spread is this phenomena, and what are the factors that can help predict tie breaks?
We started with a random set of 715 Twitter users, and the 245,586 Twitter users that “follow” them at Time 1 (July 2009). We looked at these users and followers again after nine months (April 2010, Time 2). Did these follow edges still exist? How many dropped over that period? The image below captures one of our 715 users, the network around them in Time 1. Those users that stopped following our user at Time 2 (the “unfollowing” users) and their connections are marked in blue. Now it’s time to pause and see what you think the overall drop “unfollow” is in our data: 5%? 15%? 25%? 75%? OK, scroll down.
Turns out, over nine month, 30% of the follow edges disappeared. On average, a single user lost about 39% of their followers over that period. How come it’s not 30%? Because the 39% is an average of averages; probably due to the fact that people with a large number of followers — of which there are fewer — lost a smaller portion of their followers, but still a large number. Does more followers mean relatively fewer unfollowers? I’ll come back to that in a second.
For this work, we were mainly interested in looking at whether well-known sociological processes are in play on Twitter in respect to unfollowing activity. So we did our lit review, and discovered that strength of ties, embeddedness within networks, and power/status are some of the key related sociological concepts (the paper explains those in detail, of course). The question then was: can we look at the network structure alone, and based on these theories, see if there are network factors that are highly correlated with unfollows?
The details of the dataset are in the paper, but for now, just imagine that for each “follow” relationship, we had the complete network graph of both nodes. So if “@ayman following @informor” was one of the edges we looked at, we could get the entire network neighborhood of @informor, and @ayman. (This network data is presented to you courtesy of Kwak et al.). What properties of @informor’s network, and of the network around @informor and @ayman, correlate with higher probability that @ayman would stop following me?
We calculated a bunch of variables, including for example, for each of our 715 initial users (let’s call them “seeds”):
- The seed’s number of followers.
- The seed’s clustering coefficient: how connected their followers are.
- The seed’s reciprocity rate: what portion of the people following them, they follow back?
- The seed’s follow-back rate: what portion of the people they follow, follow them back?
- The seeds follower-to-friends ratio.
And for each seed and follower pair in our data, we computed aspects of their relationship:
- How many connections they have in common (i.e., users the seed and follower both connect to)?
- What is the different in prestige between the two (in terms of number of followers)?
- Does the seed reciprocate the connection to the follower?
So, which factors correlated most with unfollow activity? We ran quite a sophisticated analysis (multi-level logistic regression), but I’ll keep it simple for here with a basic analysis of the factors that our analysis had shown to contribute to the probability that a follower will unfollow a seed. For the more “scientific” study, check out the paper.
First, what did *NOT* have impact: the number of followers a seed had at Time 1 had very limited impact on the probability of unfollows for that seed, and that impact was mitigated by other factors. A figure (limited to seeds who had less than 500 followers) demonstrates this.
So what played a major role? Reciprocity, for one, did. Do you follow someone that follows you? If you do, they are much less likely to unfollow you. Remember our 245,586 connections? Half of them were reciprocated (the seed also followed their follower). When the relationship was reciprocated, 16% of the followers unfollowed. When it wasn’t, a whopping 45% did. Before I throw a figure in, an important note about causality: we don’t know the causality. For example, pairs of users who are closer in real life (“strong ties”) are likely to have a reciprocated relationship and of course, their connection is not likely to break (because they are close). A deeper examination is needed to show whether the reciprocity act *alone* helps in maintaining the tie, although the analysis in the paper suggests that it contributes more than other factors that typically signify strong relationships.
Here’s one more thing to think about: a user’s follow-back rate was highly correlated with a lower ratio of unfollows, but the ratio of followers to followees wasn’t. The follow-back rate is portion of the people a user follows that follow them back. For example, I may have 15 followers and 10 followees (people I follow) on Twitter. Out of the people I follow, 8 follow me back. So my follow-back rate is 80%, and my follower-to-followee ratio is 1.5. Both these metrics are potential measure of “importance” on Twitter, but the fact that only one — the follow-back rate — impacts the rate in which people stop following me, hints that the follow-back rate might be a better measure of importance and success on Twitter. Makes sense, Ayman? What’s your follow-back rate?
What else? the embeddedness is the last thing I will touch on, you can read the paper for more (it’s only a 4 pager, don’t be too easy on yourself). And by embeddedness I do not mean the number of YouTube videos you post on your Twitter stream, but the sociological concept that captures set of relationships that exists between the individuals in a relationship through third parties (i.e., common friends). More common friends? Your relationship is presumed to be stronger. It is not a surprise, then, that the larger the number of common neighbors two Twitter users have, the less likely one is to unfollow the other. From our data, this figure shows, for each level of common neighbors a “follow” relationship had, what percent of these follows became “unfollows”. For example, from all follow relationships that had no common neighbors at Time 1, 78% did not exist at Time 2; one common neighbor was enough to drop that number to 46% (and it keeps dropping — I stopped at 15 because you get the idea).
What didn’t we look at? Pretty much everything else! We relied on network structure alone to investigate these unfollows, as a first step. But there’s a lot more: how often do you tweet (or not)? How interesting are your posts? How similar your topics are to the people following you? We are now exploring all these factors and additional variables. Stay tuned.
[update: slideshare presentation here].