RPubs - Analysis of Literal Text from the Death Certificate

This R Markdown report, published to RPubs, performs text mining analysis on a representative sample of death records from public data available for purchase from the Washington State Department of Health. The purpose of this project is to perform exploratory analyses prior to a larger examination of records for the United States from 2003-2017. The analysis will focus primarily on records pertaining to drug overdose deaths with the following ICD-10 codes: X40-44, X60-64, X85, Y10-Y14.

Sample Analyses

  • Top words among all death records.
  • Top words according to ICD code and drug overdose status, examining raw counts and term frequency inverse document frequency (TF-IDF).
  • Top bigrams, including network graph visualizations depicting the relationship of common word pairs.

Refer to RPubs Report for Analyses

Top Words Top Words by overdose status Top Words by ICD code Top bigrams, represented by network graph