Overview
- Introduce regular expressions
- Identify the basic workflow for conducting text analysis
- Define the tidy text formats (Chapter 1)
- Word frequencies and tf-idf (Chapter 1 and 3)
Before class
For this and the following lecture we use the book Tidy Text Mining with R. Before this class: read Chapter 1, 3, and 4. Before next class: read chapters 2 and 6
Class materials
- Run the code below in your console to download today’s in-class
materials:
usethis::use_course("CFSS-MACSS/text-analysis-fundamentals")
Additional resources
Basic text analysis: * The three case studies included in the book “Text Mining with R” (see assigned readings) provide in-depth examples on how to preform text analysis from A-Z. I recommend Chapter 9 Case study: analyzing usenet text * For an excellent theoretical explanation of these topics see Speech and Language Processing by Daniel Jurafsky & James H. Martin, especially Chapter 2, 3, and 4
Regular Expressions: * Chapter
14 of our textbook R for
Data Science * Chapter
15
by Rochelle Terman, explains stringr() (using the R for Data Science
textbook)
* For an overview of regular expressions in R see Chapter
17
from R Programming for Data Science. This book covers the entire range
of regular expressions packages and functions; in-class we focus only on
the stringr() package * stringr() documentation and
cheatsheet