Harnessing the Twitter API for Sentiment Strategies
In this project we will be writing an application which downloads tweets from Twitter. We are continuing our journey leaning C#, as we started with our Yahoo Finance data downloader.
Twitter has a REST API that allows us to search for tweets, users, timelines, or even post new messages. We will use an incredible C# Twitter Library called Tweetinvi. It has everything you need to start building your own program. There are other alternatives, but we found this was the easiest and most complete. To use this program, you need to have a Twitter developer account, and use your own credentials.
Application Structure
The program is separated to keep the twitter interactions, file management and application logic separate. These are separated into three files: Program.cs, FileManagement.cs, Twitter.cs and Tweet.cs.
- Program.cs – Loop over twitter symbols, downloading symbols and send them to files to be saved.
- FileManagement.cs – Load user names, and append new tweets to end of tweet files.
- Twitter.cs – Download tweets, manage the rate limit constraints, login to twitter API.
- Tweet.cs – Give format to the downloaded tweets
This allows you to easily change the application logic, the location you save files or even change the twitter library without affecting other parts of your code. The program starts by logging into twitter with the SetCredentials function. This requires 4 keys from the twitter developer website.
Twitter.SetCredentials(accessToken:"xxxxx", accessTokenSectet:"xxxxx", consumerKey:"xxxxx", consumerSecret:"xxxxx");
We retrieve the twitter user names from a file twitterUsernames.csv, which contains the list of usernames to download. We’ve collected a list of 3000 financial and news symbols for this project that we scrapped from twitter lists and search results. We also estimate the next best time to update the users tweets, based on their frequency of tweeting.
var usernames = FileManagement.GetUsernames(); var nextUpdateTime= FileManagement.GetNextUpdateTime();
The tweet downloading and rate limiting are entirely managed by the twitter class; the function GetTimeline manages downloading all the historical tweets possible, or only downloading updates.
var tweetList = Twitter.GetTimeline(userName, lastTweet, ref tweetCount);
The freshly downloaded tweets are serialized to JSON by Json.NET and then written to a file – one per twitter username.
Optimizing Tweet Downloads
We want our program to download the maximum historical tweets possible per user, and then recheck accounts for new tweets. Twitter rate limits API requests to 300 requests per 15 minutes, and allows access to a maximum of 3200 historical tweets. Additionally each request can download a maximum of 200 tweets at a time. To maximize the productive use of our requests we will constantly calculate an average time span from the user’s latest tweets, and set the time the program should recheck for new tweets so we’re confident there will be at least 1 new tweet. The following code reads the tweets from the file and calculates the average gap between tweets:
public static TimeSpan GetAverageTimeSpan(List tweets)
{
if (tweets.Count == 0)
{
return TimeSpan.FromSeconds(500);
}
else
{
List dates = new List();
foreach (var line in tweets)
{
dates.Add(line.Time);
}
var difference = dates.Max().Subtract(dates.Min());
var averageTimes = TimeSpan.FromMilliseconds(difference.TotalMilliseconds / (dates.Count()));
return averageTimes;
}
}
Downloading Tweets
When downloading tweets, we check if we have already downloaded tweets for this user. If we have historical tweets for this user, we’ll only download the updates. Twitter’s API has 2 ways for doing this: Each tweet in the tweetosphere has an unique ID number. To download updates, we download every tweet since an ID (since_id). This means, “download all tweets since the last tweet we got”.
public static long LastSavedTweetID(List getTweets) { var lastLine = getTweets.First(); long lastTweetID = lastLine.ID; return lastTweetID; }
If we don’t have any historical tweets for this user, the program will download all historical tweets possible. With each request, we’ll attempt to download the last 200 tweets, and the max_id specifies the ID of the most recent tweet we want in this request.
public static List (string userName, List getTweets, ref int tweetCount) { List tweets; if (getTweets.Count == 0) { Console.WriteLine(" First time downloading " + ticker + ", creating new file."); tweets = TweetsDownload(true, userName, getTweets, ref tweetCount); } else { tweets = TweetsDownload(false, userName, getTweets, ref tweetCount); }
Encoding and Saving Tweets
Each Tweet comes in its own format, containing a lot of information (ID, language, message, date, etc). We save a personalized subset of this information in the Tweet class:
/// Create a new tweet from an original Tweetinvi object public Tweet(Tweetinvi.Core.Interfaces.ITweet original) { this.ID = original.Id; this.Text = original.Text.Replace(",", ""); this.Time = original.CreatedAt; this.Retweets = original.RetweetCount; this.Favourites = original.FavouriteCount; this.User = original.Creator.Name; this.Followers = original.Creator.FollowersCount; }
The new encoded tweets are added to a list, that is then written to its “username.txt” file.
//Encode each tweet and add them to a list public static List Serializer(List tweetList) { var encodedList = new List(); foreach (var line in tweetList) { var encodedTweet = Tweet.Serializer(line); encodedList.Add(encodedTweet); } return encodedList; } //Open & Write to file only if there are new tweets FileManagement.Writer(encodedList, file);
API Restrictions management
Finally, we should rate limit the requests we do to the API. The API Ready function will make the program sleep until new requests are available.
/// Check if API is ready for new request private static int WaitForAPIReady() { int count = 0; do { DateTime currentTime = DateTime.Now; currentTime = currentTime.AddMinutes(-15); count = (from time in timeStamps where time > currentTime select time).Count(); if (count > 290) { Console.WriteLine(" Twitter downloading limit reached. Waiting..."); Thread.Sleep(50000); } } while (count > 290); return (300 - count); }
That was a brief explanation of how we handled twitter’s API limitations using Tweetinvi. The downloader is built, now the fun part begins: What accounts shall we scan? What can we do with the downloaded data? It would be fun to see an algorithm that uses twitter sentiment data to make investing decisions!
P.S: The libraries needed (Json.Net, Tweetinvi) are not included in the file. You can download them from NuGet in Visual Studio, or in the developer’s website.
