Text analysis: topic modeling and sentiment analysis

Overview

  • Introduce supervised and unsupervised text classification
  • Define sentiment analysis and demonstrate its use (Chapter 2)
  • Define topic modeling with Latent Dirichlet allocation and demonstrate its use (Chapter 6)

Before class

Class materials

  • Run the code below in your console to download today’s in-class materials: usethis::use_course("CFSS-MACSS/text-analysis-fundamentals-and-sentiment-analysis-and-tm")

Additional resources

  • See additional resources for the previous lecture on text analysis and regular expressions
  • Original Topic Modeling (LDA) article by Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.”
  • For an introduction to supervised classification with text data, read Classification in Supervised Machine Learning for Text Analysis in R
  • Two blog posts by David Robinson (co-author of tidytext) analyzing Donald J. Trump’s twitter account. Regardless of your political affiliations, these are excellent examples demonstrating of the key principles of reproducible research that we’ve learned in this course (e.g., R Markdown documents and knitting code with output; Retrieving data from APIs; Textual analysis with tidytext; Visualizations with `ggplot2)