A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, September 15, 2009

Twitter friend or foe

I'm still trying to "get" Twitter. MySpace I never got. FaceBook, I get. One thing I notice about Twitter is the spammers. I had been diligent in blocking them but today I got to wondering if I could programatically determine whether these followers were spammers. In the beginning, all of few months ago, it was easier. If someone started following me with a follower to friend ratio that was just ridiculous, they were probably a spammer. Now I'm seeing more clever approaches. Spam creators are making lots of accounts and linking between them to bring the friend to follow ratio to a more natural balance.

What this code will hopefully do is attempt to determine if a follower is a spammer or not. Beyond simple friend:follower:tweet calculations, I think looking at bio information and actual tweets, looking for overlaps between links in both places would be a good start. Additionally, repeated text in tweets (excluding standard noise like RT, etc) might be a good indicator of a spammer. For now, it just computes the aforementioned ratios between your followers and prints it out.

If nothing else, this code was fun because I've now actually used LINQ, LINQ to XML to be precise. Expert, not in the slightest but it at least seemed a much more pleasant way to iterate through an XML doc. I still have a lot of reading to do on it as this is based on 5 min tutorial I found. Code below was written in VS 2010 beta on Windows 7 RC1 but should work on VS 2008. Obviously, you need to set your userid and password to get your follower info. Link to the Yedda Twitter wrapper



namespace FriendOrFoe
{
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;

using Yedda;

/// <summary>
/// This is a rough attempt to see if I can identify spammers that follow me on
/// Twitter @billinkc.
/// </summary>
public class Program
{
/// <summary>
/// As it's a first pass, I'm just doing everything
/// in Main until I make sure I understand how to work with
/// twitter service calls as well as look for patterns between
/// spammers and non-spammers
/// </summary>
/// <param name="args">list of cammand-line arguments</param>
static void Main(string[] args)
{
Yedda.Twitter t = null;
t = new Twitter();
string username = "yourAcccountHere";
string password = "password";
string results = string.Empty;
results = t.GetFollowers(username, password, Twitter.OutputFormatType.XML);
XDocument doc = null;
doc = XDocument.Parse(results);
var xmlSource = XDocument.Parse(results);

var query = from p in xmlSource.Elements("users").Elements("user")
select p;

Twit twit;
Console.WriteLine("User\tFriend:Follower\tFriend:Tweet\tFollower:Tweet");
foreach (var record in query)
{
twit = new Twit(record.Element("name").Value, Convert.ToInt32(record.Element("id").Value), Convert.ToInt32(record.Element("friends_count").Value), Convert.ToInt32(record.Element("followers_count").Value), Convert.ToInt32(record.Element("statuses_count").Value));
Console.WriteLine(twit);
}
Console.Read();
}
}

/// <summary>
/// A person on twitter and the metrics surrounding them
/// </summary>
struct Twit
{
/// <summary>
/// A users name
/// </summary>
public string userName;

/// <summary>
/// The twitter id for an account
/// </summary>
public int id;

/// <summary>
/// The number of friends a user has
/// </summary>
public int friendCount;

/// <summary>
/// The count of followers for a user
/// </summary>
public int followerCount;

/// <summary>
/// The count of status updates. This should be a bigint BTW
/// </summary>
public int tweetCount;

/// <summary>
/// Non-default constructor, does nothing special
/// </summary>
/// <param name="userName">User name for a twitter account</param>
/// <param name="id">id representing the account</param>
/// <param name="friendCount">count of friends</param>
/// <param name="followerCount">count of followers</param>
/// <param name="tweetCount">count of posts/tweets</param>
public Twit(string userName, int id, int friendCount, int followerCount, int tweetCount)
{
this.userName = userName;
this.id = id;
this.friendCount = friendCount;
this.followerCount = followerCount;
this.tweetCount = tweetCount;
}

/// <summary>
/// Show the user and guess whether they are a spammer.
/// TODO: play with numbers. Get more info to look at bio & content of tweets
/// esp. for repeated crap
/// </summary>
/// <returns></returns>
public override string ToString()
{
double friendToFollower = 0d;
double friendToTweet = 0d;
double followerToTweet = 0d;
friendToFollower = this.friendCount / ((this.followerCount == 0) ? 1 : this.followerCount * 1.0);
friendToTweet = this.friendCount / ((this.tweetCount == 0) ? 1 : this.tweetCount * 1.0);
followerToTweet = this.followerCount / ((this.tweetCount == 0) ? 1 : this.tweetCount * 1.0);
//return string.Format("User: {0} Friend:Follower {1} friend:tweet {2} follower:tweet {3}", this.userName, friendToFollower, friendToTweet, followerToTweet);
return string.Format("User: {0} {1:F} {2:F} {3:F}", this.userName, friendToFollower, friendToTweet, followerToTweet);
}

}
}


Oh and as analysis of my own account, it looks like if the Friend:Tweet is higher than 3, it might be a good candidate for blocking. Next step will be to set up a honeypot account and start posting in trending topics.

No comments: