Using Sociology(!) to Explain Unfollows on Twitter

What gives, @ayman is no longer following me on Twitter!

Well, he still does, not least because he knows I will send roadkill to his office address if he stops. But surely, people stop following one another on Twitter all the time. Right? Right? Yes, right, as we show in our recent paper (caution, PDF), with my PhD students Funda Kivran-Swaine and Priya Govindan, to be published at CHI 2011.

Many studies, in academia and industry, in computer science and sociology (this one too), examine creation of new ties in social networks, but very few examine tie breaks and persistence. Why? One reason is that, in computer science, models of tie creation have immediate consequences for systems (e.g., recommending new contacts). Another reason is that tie breaks are rare, or hard to detect/define in many social networks, especially those networks studied by sociologists (when does Naaman’s tie with Ayman break? after 3 years on not communicating? 20?). Ron Burt‘s work is an exception, but Ron is always an exception, isn’t he.

Enter Twitter, where we can witness a dynamic social system, and where ties are created and broken for all to observe. Op-por-tu-ni-ty! Can we shed some light on the tie break phenomena in Twitter? How wide-spread is this phenomena, and what are the factors that can help predict tie breaks?

We started with a random set of 715 Twitter users, and the 245,586 Twitter users that “follow” them at Time 1 (July 2009). We looked at these users and followers again after nine months (April 2010, Time 2). Did these follow edges still exist? How many dropped over that period? The image below captures one of our 715 users, the network around them in Time 1. Those users that stopped following our user at Time 2 (the “unfollowing” users) and their connections are marked in blue. Now it’s time to pause and see what you think the overall drop “unfollow” is in our data: 5%? 15%? 25%? 75%? OK, scroll down.
Unfollowing on Twitter.
Turns out, over nine month, 30% of the follow edges disappeared. On average, a single user lost about 39% of their followers over that period. How come it’s not 30%? Because the 39% is an average of averages; probably due to the fact that people with a large number of followers — of which there are fewer — lost a smaller portion of their followers, but still a large number. Does more followers mean relatively fewer unfollowers? I’ll come back to that in a second.

For this work, we were mainly interested in looking at whether well-known sociological processes are in play on Twitter in respect to unfollowing activity. So we did our lit review, and discovered that strength of ties, embeddedness within networks, and power/status are some of the key related sociological concepts (the paper explains those in detail, of course). The question then was: can we look at the network structure alone, and based on these theories, see if there are network factors that are highly correlated with unfollows?

The details of the dataset are in the paper, but for now, just imagine that for each “follow” relationship, we had the complete network graph of both nodes. So if “@ayman following @informor” was one of the edges we looked at, we could get the entire network neighborhood of @informor, and @ayman. (This network data is presented to you courtesy of Kwak et al.). What properties of @informor’s network, and of the network around @informor and @ayman, correlate with higher probability that @ayman would stop following me?

We calculated a bunch of variables, including for example, for each of our 715 initial users (let’s call them “seeds”):

  • The seed’s number of followers.
  • The seed’s clustering coefficient: how connected their followers are.
  • The seed’s reciprocity rate: what portion of the people following them, they follow back?
  • The seed’s follow-back rate: what portion of the people they follow, follow them back?
  • The seeds follower-to-friends ratio.

And for each seed and follower pair in our data, we computed aspects of their relationship:

  • How many connections they have in common (i.e., users the seed and follower both connect to)?
  • What is the different in prestige between the two (in terms of number of followers)?
  • Does the seed reciprocate the connection to the follower?

So, which factors correlated most with unfollow activity? We ran quite a sophisticated analysis (multi-level logistic regression), but I’ll keep it simple for here with a basic analysis of the factors that our analysis had shown to contribute to the probability that a follower will unfollow a seed. For the more “scientific” study, check out the paper.

First, what did *NOT* have impact: the number of followers a seed had at Time 1 had very limited impact on the probability of unfollows for that seed, and that impact was mitigated by other factors. A figure (limited to seeds who had less than 500 followers) demonstrates this.
num followers

So what played a major role? Reciprocity, for one, did. Do you follow someone that follows you? If you do, they are much less likely to unfollow you. Remember our 245,586 connections? Half of them were reciprocated (the seed also followed their follower). When the relationship was reciprocated, 16% of the followers unfollowed. When it wasn’t, a whopping 45% did. Before I throw a figure in, an important note about causality: we don’t know the causality. For example, pairs of users who are closer in real life (“strong ties”) are likely to have a reciprocated relationship and of course, their connection is not likely to break (because they are close). A deeper examination is needed to show whether the reciprocity act *alone* helps in maintaining the tie, although the analysis in the paper suggests that it contributes more than other factors that typically signify strong relationships.

reciprocated

We can even look at the user’s tendency to reciprocate follow relationship, and its effect on the percent of followers they lose:
reciprocity

Here’s one more thing to think about: a user’s follow-back rate was highly correlated with a lower ratio of unfollows, but the ratio of followers to followees wasn’t. The follow-back rate is portion of the people a user follows that follow them back. For example, I may have 15 followers and 10 followees (people I follow) on Twitter. Out of the people I follow, 8 follow me back. So my follow-back rate is 80%, and my follower-to-followee ratio is 1.5. Both these metrics are potential measure of “importance” on Twitter, but the fact that only one — the follow-back rate — impacts the rate in which people stop following me, hints that the follow-back rate might be a better measure of importance and success on Twitter. Makes sense, Ayman? What’s your follow-back rate?

Unfollowing on Twitter: followback rate.

What else? the embeddedness is the last thing I will touch on, you can read the paper for more (it’s only a 4 pager, don’t be too easy on yourself). And by embeddedness I do not mean the number of YouTube videos you post on your Twitter stream, but the sociological concept that captures set of relationships that exists between the individuals in a relationship through third parties (i.e., common friends). More common friends? Your relationship is presumed to be stronger. It is not a surprise, then, that the larger the number of common neighbors two Twitter users have, the less likely one is to unfollow the other. From our data, this figure shows, for each level of common neighbors a “follow” relationship had, what percent of these follows became “unfollows”. For example, from all follow relationships that had no common neighbors at Time 1, 78% did not exist at Time 2; one common neighbor was enough to drop that number to 46% (and it keeps dropping — I stopped at 15 because you get the idea).

common neighbors

What didn’t we look at? Pretty much everything else! We relied on network structure alone to investigate these unfollows, as a first step. But there’s a lot more: how often do you tweet (or not)? How interesting are your posts? How similar your topics are to the people following you? We are now exploring all these factors and additional variables. Stay tuned.

[update: slideshare presentation here].

27 thoughts on “Using Sociology(!) to Explain Unfollows on Twitter

  1. naaman Post author

    Good question, person in the first row!

    There’s definitely room for second-order information (e.g., the follower’s own ranking or the typical tendency of the follower to follow back). In fact, a correct model will probably be a complete, PageRank-like (e.g. TunkRank) model that recursively considers all the connections and nodes. However, what we wanted to show here is that even with just local information (1.5 hops ego-centric network) we can identify patterns that correlate with unfollow activity.

  2. naaman Post author

    But it can “account” for many thing (can choose many variables for node-level iterative metric). So my point is that by showing which of these variables corresponds to fewer unfollows, we identify a good candidate for status that can be computed iteratively.

  3. Pingback: Tweets that mention Using Sociology(!) to Explain Unfollows on Twitter › The Ayman and Naaman Show -- Topsy.com

  4. Dean Eckles

    Interesting stuff. I like the results on reciprocity and overall follow-back rate. Here and elsewhere, it seems like it would have been enlightening to include interaction terms in your model. Likewise, allowing the effect of reciprocity to vary by seed node would be fascinating.

    The paper is so short! When I started reading, I was expecting a 10-page CHI paper. I hope you release something longer — whether as a journal article or working paper.

  5. naaman Post author

    I think our model explains some interaction already (just moving from Model 1-> Model 2-> Model 3) but you are right, there is more. The danger is complicated an already-overloaded methodological approach… For instance, our multi-level regression exactly allows the effects of all seed-level variables to vary by seed, including seed-level reciprocity.

    And yes, a journal article will be written :)

  6. Tim Maly

    I’d love to know what portion of that churn of followers was spambots and other auto-follow accounts (people following you in the hopes that you’ll follow back and if you don’t the move on because of the 2,000 limit).

  7. naaman Post author

    Good question; while we are not sure about that (we vetted our “seeds” but not their followers), a few points:
    a. The data is from July 2009: Twitter was smaller, and had much less spam
    b. There were very account deletions were very rare in the data; you’d hope the spammers were all caught and deleted
    c. Since this is likely to effect all relationships in a uniform matter, the overall findings still stand (i.e., the variables that effect unfollows) even if the magnitude of the phenomena is a little different than in our data.

  8. Dean Eckles

    I was thinking of both (seed’s) reciprocity and follow-back rates and (follower’s) ratio interacted with reciprocity, as the latter would likely have meaning relative to the former. From my reading, you do not include any interaction terms. There are only varying intercepts by seed, not varying slopes (which would have to be varying slopes for follower/dyad level variables, not seed level variables). You’re right to worry about model complexity though: you might use a hierarchical Lasso (L1 penalty) to do shrinkage and variable selection while keeping the multilevel structure.

    On spammers: spammers would be expected to have a small ratio of followers to followees. They might have a lot of leverage on your follower’s ratio coefficient.

  9. Munmun

    Nice study of role of network structure variables on severing of ties! I am guessing the dataset left out non-human accounts (not spam necessarily, e.g. @cnn, @mashable etc.). I think it will also be interesting to see in future work how common interests (in terms of content) or other forms of homophily e.g. location drive these dynamics.

  10. naaman Post author

    Dean, still I maintain that the transition from Model 2 to Model 3 (in paper) tells us a lot about the relationship between recip-dyad and recip-seed, as well as the reciprocation and follow-back. But you’re clearly more comfortable w/ the methods, maybe we can hack it together if you want. Hey, I’ll be at Stanford next month…

    Mummun, we’ll hack it together too! (e.g., the homophily).

  11. Dean Eckles

    Yes, I like the addition of the network characteristics in Model 3. And I do see how the addition of the prestige ratio allows the number of followers for s and f to interact.

    Nonetheless, it seems worthwhile to allow the key dyad-level behavior — reciprocity — to interact with the overall following / being followed behaviors of the two nodes. Allowing for an interaction with the prestige ratio could be part of this My guess is that this would further highlight the importance of reciprocity (one of your main conclusions) by contextualizing it in the overall behavior of the two nodes. For example, if s is has a low reciprocity rate but follows f, I would expect very low probability of tie breaking; but if s has a high reciprocity rate and doesn’t follow f, I would expect a much higher probability of tie breaking.

    That’s neat you’re coming to Stanford. Let’s do met up. I’ve recently also been working with dyadic social network data.

  12. Joe McCarthy

    Wow, you sure know how to pack a lot of interesting content into a short format paper!

    I have a question, and a few related items to share.

    I’ve seen a variety of estimates of reciprocal following rates in less scholarly contexts, but most of them suggest higher rates. I don’t see any mention of the mean reciprocal following rate in the paper, but you suggest it is “about half” in the blog post. Can you offer a bit more precision?

    About a year ago, I described a range of automatic, semi-automatic and manual “techniques” used to promote reciprocal following in a rant blog post about the commoditization of Twitter followers (unintended but implicit subtitle: “How to write a blog post that will never get tweeted”). My favorite example was someone who followed (and unfollowed) me twice; when I checked out his Twitter feed, it was filled with directed messages of the form “@newfollowee How about returning the follow?”. My least favorite example was a Ponzi-like fully automated reciprocal following system through which anyone who signed up would immediately follow – and be followed by – everyone who had previously signed up.

    Today, catching up on a backlog of tweets from my followees, I enjoyed reading – and tweeting a link to -a short blog post about the fallacy of social media reciprocation, which offers some [admittedly less scholarly] insights into the practice of non-following (vs. unfollowing), and a strong argument in favor of one of your topics for future work: worthwhile participation (or, as you put it, How interesting are your tweets?).

  13. naaman Post author

    Thanks and good question. Overall, 49.5% of the edges we looks at were reciprocated. However, this of course varied between seeds (the 715 users whose followers we examined). I just ran the numbers; the reciprocity rate for seeds is 42% on average (SD=20%). Note that it is still the case that number of followers is highly correlated with number of followees (as most previous research had shown).

    Thanks for the pointers (I read your post back when it was published). Important detail that I glanced over in this post is that *all* our users had < 5000 followers to begin with (filtered to non-celebrity, essentially) which is a bit different in dynamics observed by other “top Twitterers” studies.

  14. Joe McCarthy

    Interesting – thanks for the clarification.

    Regarding the “number of followers is highly correlated with number of followees”, one could say there are three types of people on Twitter:
    * those who have more followers than followees
    * those who have about the same number of followees and followers
    * those who have more followees than followers

    I tend to follow Twitter users who have a much larger number of followers than followees, as I generally find their signal-to-noise ratio to be significantly higher (i.e., they tend to me more informers vs. meformers or conversationalists).

    If you have the time and inclination to further indulge my curiosity (now or in future work), I’d be very interested to know whether there are statistically different unfollowing patterns among the three groups (perhaps restricting the first group to those who have 2x more followers than followees, the second to those who have 2x more followees than followers, and the third group including all those in the middle (less than 2x difference between the two counts). My hypothesis is that unfollowing rates among the first group would be much lower than the second, which would in turn be much lower than the third.

  15. Pingback: Weekend Reading: The Midnight Hour Edition - ProfHacker - The Chronicle of Higher Education

  16. Dean Eckles

    According to the table, followers with high follower to followee ratio are much less likely to unfollow. But the strength of this relationship is reduced when the dyad properties are included.

    So this doesn’t answer your categorical question, but it does answer a version that asks about the relationship between unfollowing and log-ratio.

    Again, I would expect that the full(er) story is in the interaction with other variables, like number of followers. A 1.5:1 ratio means something very different a different scales, I think. Users with few followers and a high ratio may just not really know how to use twitter or not use it much (so they haven’t even bother to reciprocate).

  17. naaman Post author

    The seeds follower-to-followee ratio had no significant effect across all models in the paper. The analysis was for the ratio is a continuous variable, not groups like you propose, but I assume it would give us the same results. Interesting, ha? I thought it would have a strong effect too.

  18. engelo

    There is a short section in the following paper with a lit-review on the dissolution of ties, starting at page 95. Might be of some use…

    Rivera, M T, S B Soderstrom, and B Uzzi. 2010. “Dynamics of Dyads in Social Networks: Assortative, Relational, and Proximity Mechanisms.” Annual Review of Sociology 36:91–115.

  19. Brad

    It’s not all about following back, now a days there are too many bots that follow you and then unfollow you within 72 hours like mentioned above. It seems Twitter is becoming a game to some people and it upsets me. Great post naaman.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>