This website was started in 2017 as a portfolio for data science projects developed by Anthony Lipphardt and hosted on GitHub. It will contain projects and code samples for topics encompassing data science including data visualization, programming, and web development.


Posts

  • Clumsy Movie Nights with a Discord Bot

    The process in developing a Discord bot with Python for the purpose of movie polling and IMDB integration for bad movie nights during the COVID-19 pandemic. Movies eventually expanded from the bad to the weird, resulting in the rebranded Clumsy Movie Night
  • Automated Identification and Coding of Drug Overdose Records

    Presentation of methods and findings for data science capstone project, using historic mortality and literal text records from 2005-2017
  • Word Embeddings Demo

    This R Markdown report, published to RPubs, demonstrates the use of several machine learning and natural language processing techniques surrounding the concept of word embeddings. Word embeddings are a form of language modeling using natural language processing and machine learning techniques to map words from a selected vocabulary to a vector of numbers. By representing words with vectors, we can map words to a point in a high dimensional geometric space. The goal of word embedding models are to place words in this geometric space such that words with high semantic similarity are close together. This project will examine how terms within a corpus of death records may be represented with word embedding models.
  • Analysis of Literal Text from the Death Certificate

    This R Markdown report, published to RPubs, performs text mining analysis on a representative sample of death records from public data available for purchase from the Washington State Department of Health. The purpose of this project is to perform exploratory analyses prior to a larger examination of records for the United States from 2003-2017. The analysis will focus primarily on records pertaining to drug overdose deaths with the following ICD-10 codes: X40-44, X60-64, X85, Y10-Y14.
  • 2013 Baltimore Speed Camera Citations

    This R Markdown report, published to RPubs, focuses on data cleaning and exploratory analyses of 2.4 million records for speed camera citations in the City of Baltimore from 2013 to the present.
  • Face Detection using Open CV

    The Ring Doorbell provides video surveillance in the event of motion detection or doorbell ring, allowing the customer to download recorded MP4 videos. This project applies face detection and segmentation using Open CV in Python using pre-trained models.
  • Classifying Drug Mention with Involvement (DMI) Deaths

    This project seeks to answer whether machine learning techniques can be used to train a classifier to predict with high precision and specificity whether a death record is a drug mention with involvement (DMI) death or not. Pre-processing and experimental scripts provided on GitHub.
  • Howard County Police Department Calls for Service, 2014-2016

    Tableau data visualization using Howard County open data portal, implemented with custom geospatial data file using Google Earth generated KML.
  • Twitter Bot Project

    Bot script, written in python, which scrapes the News API and Google RSS feed for articles containing a specified term. Post details the process and provides a GitHub repo with sample code.
  • CDC WONDER API - Detailed Mortality Database Example

    Includes examples in python of creating, sending and processing requests from the CDC WONDER API, as well as some basic data visualizations. This example will focus on the Detailed Mortality database, which provides number of deaths and death rates (crude or age-adjusted) for underlying cause of death at the national level.
  • Dynamic Pivot Tables in JavaScript

    Example for generating a large pivot table from a CSV dataset using the PivotTable JavaScript library
  • NVSS Quarterly Provisional Estimates

    Several sample data visualizations created in D3 and Tableau for NVSS Rapid Release of Quarterly Provisional Estimates.