January 20, 2011

Analytics with Twitter Data

Twitter is one of the largest "data-producers" on the Web presently. Am not sure about the exact numbers of the storage that the tweets require on a daily basis, but a few TBs would not surprise me; add to that the spurts in volumes when there is a controversy or some event happening. All of this leads to interesting data that needs to be deciphered; and also some awsome research work that can be applied to manage the data efficiently for the users and engage them more with Twitter.

When i was looking for possible features that I might actively use, i actually could list a few of them. I am pretty sure that the Product Managers at Twitter would have some of these features in their TODO list, but would be interesting to see when these actually get implemented; or the rationale behind not implementing them.

1. Users who follow you, but you do'nt follow them.
2. Users whom you follow, but they don't follow back.
3. Notification when a user stops following you. I need to research on why Twitter does not have this - was this by design?
4. Trend analysis of users who follow/quit you - based on the tweets that you do.
5. Show the most active users and lazy users - active and lazy are defined by the number of tweets and also the popularity of the tweets.Popularity can also be measured by how much discussion a tweet generates, or how much retweets happen for that tweet.
6. Automatic lists and follow suggestion : when we follow a user, twitter can suggest which would be the most likely fit for a user based on his tweet patterns. The present Suggestion scheme is not all powerful and needs some tweaking.
7. Discover clusters/groups of the followers. Centrality of users - show a graph wherein this relationship can be displayed.
8. Decipher moods/sentiments from the tweets; or other possible natural language processing techniques that can be applied on the tweets to gather interesting patterns or insights.
9. Usage analysis
  a. Based on the day of the hour we can find out do people tweet often during mornings or evenings.
  b. Do people prefer the web or mobile devices for tweeeting. What % of people uses other apps?
  c. Who retweets you often? or what category of tweets by you get retweeted often or generate the maximum discussions.
10. Most famous tweets for the day/week/month - based on retweets, follow-up discussions, celebrity status of the tweeter, number of followers.
11. Duplicate detection of tweets. Also, automatic compression of tweets which fall in a thread. This would help a lot in reducing the information clutter.
12. what is the similarity between two users - based on the nature of tweets. Corollary would be : what topics/categories does a user often tweet on?
13. Better trend analysis.

3 comments:

musically_ut said...

Hi,

I have been playing around with Twitter of late too and here are my 2 cents on features requested (broken comments because of character limit):


1. Users who follow you, but you do'nt follow them.
2. Users whom you follow, but they don't follow back.

True.


3. Notification when a user stops following you. I need to research on why Twitter does not have this - was this by design?

There is an app for that, which apparently also got nominated for the Short Awards for one of the best Twitter Apps. However, I think that this is a feature which lets people still keep their Twitter accounts useful while avoiding a social confrontation. :)


4. Trend analysis of users who follow/quit you - based on the tweets that you do.

I am not sure how to define quantitatively the causal link between what one tweeted last/15 minutes ago/1 hour ago/a day ago with what changes are happening with one's social network. The best one can do is perhaps put them all in a singular timeline showing Tweets interspersed with following/quitting. Is that what you had in mind?


5. Show the most active users and lazy users - active and lazy are defined by the number of tweets and also the popularity of the tweets.Popularity can also be measured by how much discussion a tweet generates, or how much retweets happen for that tweet.

I think people will be heavily underwhelmed if they knew this about themselves. :)


6. Automatic lists and follow suggestion : when we follow a user, twitter can suggest which would be the most likely fit for a user based on his tweet patterns. The present Suggestion scheme is not all powerful and needs some tweaking.

I think this is a part of the current million dollar question: how to recommend the right thing to a customer.
We'll be seeing improvements in it till hell freezes over.


7. Discover clusters/groups of the followers. Centrality of users - show a graph wherein this relationship can be displayed.

Flowing data had a great post regarding this some time ago.
I think it is only a matter of time before @TwitterFolks find one which suits them and employ it.

contd ...

musically_ut said...

contd ...


8. Decipher moods/sentiments from the tweets; or other possible natural language processing techniques that can be applied on the tweets to gather interesting patterns or insights.

People at Northwest university, among others, are working on it. Though sentiment analysis is generally a tricky to do upfront for customers, it might give some insights on behind the curtains on anonymized data.



9. Usage analysis
a. Based on the day of the hour we can find out do people tweet often during mornings or evenings.
b. Do people prefer the web or mobile devices for tweeeting. What % of people uses other apps?
c. Who retweets you often? or what category of tweets by you get retweeted often or generate the maximum discussions.
10. Most famous tweets for the day/week/month - based on retweets, follow-up discussions, celebrity status of the tweeter, number of followers.

There is Klout for that, though I am not sure how good they are.


11. Duplicate detection of tweets. Also, automatic compression of tweets which fall in a thread. This would help a lot in reducing the information clutter.

Threading of tweets is not easy to detect. Personal experience.


12. what is the similarity between two users - based on the nature of tweets. Corollary would be : what topics/categories does a user often tweet on?

This, I think, is a natural extension of PeerIndex.


13. Better trend analysis.

Better, unfortunately, is very difficult to quantify. More so, they are prone to being gamed if not done properly.

~
musically_ut

Venkat said...

@Utkarsh : the idea of this post was to include all these features 'as part of twitter' and not a third party app. MOst of the features are better implemented when twitter leverages its own platform; and we do not ncessarily have to depend on 3rd party apps - there are tradeoffs with the latter. For eg. i have scripts to check for #1 and #2 to check followers etc, but its a pain to maintain it, better if implemented via Twitter. Also sentiment analysis is a much studied topic - check out Mannings' group at Stanford - so many projects that students do as their projects. Patil's team at Linkedin did an excellant job with graph analysis - the advantage of such a kind of analysis is that you can launch that as a product in addition to that being 'cool'. Qwitter(or something similar ) was there present which found out when people unfollow you.

Hope you get the drift ;)

I would imagine this feature set as a Dashboard at Twitter than having a zillion apps running all over the place; and personally i am not a great fan of testing each and every api/app that is in the market.

What the Web needs is consolidation, and i think that is happening.