This project stems from a Twitter post from my good friend Andrew Naber, who jokingly suggested the creation of a bot that sends the ‘Old Man Yells at Cloud’ every time a news article about millennials appears on the web. Having just covered web scraping and APIs in my Programming for Data Science class, I happily obliged. Thanks to Noah Diekemper for introducing and demoing the News API.

This twitter bot, written in python, is configured to search for a given a term - ‘millennial’ in this case - and scrapes the News API and Google RSS feed for articles containing that term. Long URLs are passed to the Bitly API for shortening and a status update with media (e.g., ‘Old Man Yells at Cloud’ picture) is then posted to the given Twitter profile using the Tweepy python library.

Code for this project is included in my twitterbot repo on GitHub. Access keys/tokens have been removed from the script. In order to get the script to work, developer tokens must be requested from Twitter and Bitly and copied into the script.

An example of the Twitter-bot using this script can be found at https://twitter.com/millennialYell.

In order to run the script on a schedule, I use the hosting service pythonanywhere.com which supports running of python scripts. Free accounts are currently available and include one daily scheduled task.

The first version of this bot is limited in that it only posts the shortened bitly link and the article title. Future improvements might include:

  • Code Refactoring - Improve readability and usability of code, specifically by creating a Bot class which can be configured when instantiated with term and persona settings, as well as encapsulating key functions such as scraping and tweeting.
  • Inclusion of twitter handle for author organization - Lookup tables could be stored and queried via SQLite, mapping domain names (e.g. cnn.com) to twitter handles (e.g. @CNN). If a mapping does not exist enter a record for the domain which would then be filled at a later time. At the moment, the Twitter API doesn’t seem to have the capability to search and return user IDs, so the process of adding the twitter handles would be manual in nature.
  • Automated quotes for bot persona depending on article title - This might be accomplished through some basic machine learning algorithms such as decision trees. Potentially a good project for next semesters Machine Learning class.