Twitter Bot Project
This project stems from a Twitter post from my good friend Andrew Naber, who jokingly suggested the creation of a bot that sends the ‘Old Man Yells at Cloud’ every time a news article about millennials appears on the web. Having just covered web scraping and APIs in my Programming for Data Science class, I happily obliged. Thanks to Noah Diekemper for introducing and demoing the News API.
I want to write a bot script, where it finds if the keyword of an article is, "Millennial", and it sends the "Old Man Yells at Cloud" pic. pic.twitter.com/8arrCRftTu
— Andrew M Naber (@AndrewMNaber) November 7, 2017
Your wish is my command. I will use https://t.co/mcZDHOZiQW & twitter for my programming for Data Science class to create this.
— Anthony Lipphardt (@a_lipphardt) November 10, 2017
This twitter bot, written in python, is configured to search for a given a term - ‘millennial’ in this case - and scrapes the News API and Google RSS feed for articles containing that term. Long URLs are passed to the Bitly API for shortening and a status update with media (e.g., ‘Old Man Yells at Cloud’ picture) is then posted to the given Twitter profile using the Tweepy python library.
Code for this project is included in my twitterbot repo on GitHub. Access keys/tokens have been removed from the script. In order to get the script to work, developer tokens must be requested from Twitter and Bitly and copied into the script.
An example of the Twitter-bot using this script can be found at https://twitter.com/millennialYell.
https://t.co/6OL1w0w9gu Millennials Plan to Redefine the C-Suite pic.twitter.com/fEY6byHQUU
— Grandpa Simpson (@millennialYell) November 30, 2017
In order to run the script on a schedule, I use the hosting service pythonanywhere.com which supports running of python scripts. Free accounts are currently available and include one daily scheduled task.
The first version of this bot is limited in that it only posts the shortened bitly link and the article title. Future improvements might include:
- Code Refactoring - Improve readability and usability of code, specifically by creating a Bot class which can be configured when instantiated with term and persona settings, as well as encapsulating key functions such as scraping and tweeting.
- Inclusion of twitter handle for author organization - Lookup tables could be stored and queried via SQLite, mapping domain names (e.g. cnn.com) to twitter handles (e.g. @CNN). If a mapping does not exist enter a record for the domain which would then be filled at a later time. At the moment, the Twitter API doesn’t seem to have the capability to search and return user IDs, so the process of adding the twitter handles would be manual in nature.
- Automated quotes for bot persona depending on article title - This might be accomplished through some basic machine learning algorithms such as decision trees. Potentially a good project for next semesters Machine Learning class.