Overview
- Introduce supervised and unsupervised text classification
- Define sentiment analysis and demonstrate its use (Chapter 2)
- Define topic modeling with Latent Dirichlet allocation and demonstrate
its use (Chapter 6)
Before class
Class materials
- Run the code below in your console to download today’s in-class
materials:
usethis::use_course("CFSS-MACSS/text-analysis-fundamentals-and-sentiment-analysis-and-tm")
Additional resources
- See additional resources for the previous lecture on text analysis and
regular expressions
- Original Topic Modeling (LDA)
article
by Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent
Dirichlet Allocation.”
- For an introduction to supervised classification with text data, read
Classification in
Supervised Machine Learning for Text Analysis in R
- Two blog posts by David Robinson (co-author of
tidytext) analyzing
Donald J. Trump’s twitter account. Regardless of your political
affiliations, these are excellent examples demonstrating of the key
principles of reproducible research that we’ve learned in this course
(e.g., R Markdown documents and knitting code with output; Retrieving
data from APIs; Textual analysis with tidytext; Visualizations with
`ggplot2)
© 2025 Jean Clipperton (materials adapted from Benjamin Soltoff and Sabrina Nardin) All Rights Reserved