HW02: Exploring and visualizing data

Homework 2 due XXX

Overview

Due by 11:59 pm on Friday, October 14th.

Now that you’ve demonstrated knowledge of your software is setup, the goal of this assignment is to practice transforming and visually exploring data.

Accessing your `hw02` repository

Go at this link to accept and create your private hw02 repository on GitHub. Once you do so, your repository will be built in a few seconds. It follows the naming convention hw02-<USERNAME>
Once the your repository has been created, click on the link you see, which will take you to your repository.
Finally, clone the repository to your computer (or R workbench) following the process below.

Cloning your `hw02` repository

After you have accessed the hw02 repository (see above), follow the same steps you completed for hw01 to clone the repository.

General workflow

Your general workflow will be:

Accept the repo and clone it (see above)
Make changes locally to the files in RStudio
Save your changes
Stage-Commit-Push: stage and commit your changes to your local Git repo; then push them online to GitHub. You can complete these steps using the Git GUI integrated into RStudio. In general, you do not want to directly modify your online GitHub repo (if you do so, remember to pull first); instead modify your local Git repo, then stage-commit-push your changes up to your online GitHub repo.

Exploring the mass shooting dataset

Your goal for this assignment is to apply what you have learned about data visualization and data exploration on a cleaned dataset that we provide: the “mass shooting” dataset.

The United States experiences far more mass shooting events than any other developed country in the world. While policymakers, politicians, the media, activists, and the general public recognize the widespread prevalence of these tragic events, policies intended to stop these events should be grounded in evidence and empirical data.

In July 2012, in the aftermath of a mass shooting in a movie theater in Aurora, Colorado, Mother Jones published a report on mass shootings in the United States since 1982. Importantly, they provided the underlying data set as an open-source database for anyone interested in studying and understanding this criminal behavior.

Obtain the data

I have included this dataset in the c3s2 library on GitHub.

If you are using R on your local computer, you first need to install the c3s2 package. Type in your console remotes::install_github("MACS40700/c3s2"). If you don’t already have the remotes library installed, you will get an error. Go back and install this first using install.packages(), then install remotes``. Finally, the mass shootings dataset can be loaded usingdata(“mass_shootings”). Use the help function in R?mass_shootings` to get detailed information on the variables and coding information.

Explore the data

Very specific questions

Generate a data frame that summarizes the number of mass shootings per year. Print the data frame as a formatted kable() table (see below).
Generate a bar chart that identifies the number of mass shooters associated with each race category. The bars should be sorted from highest to lowest.
Generate a boxplot visualizing the number of total victims, by type of location. Then, redraw the same plot, but remove the Las Vegas Strip massacre from the dataset.

Very open-ended question

Answer the following question. Generate appropriate figures/tables to support your conclusions. Provide 1-2 paragraphs of written interpretation of your results. Graphs and/or tables alone will not be sufficient to answer this question.

Are mass shootings with shooters suffering from mental illness different from mass shootings with no signs of mental illness in the shooter? Your graphs should assess the relationship between mental illness and total victims, between mental illness and location type, and intersection of all three variables.

Formatting Guide

Formatting graphs

While you are practicing exploratory data analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:

A title
Labels on the axes (see ?labs for details)

This is just a starting point. Consider adopting your own color scales, taking control of your legends (if any), playing around with themes, etc.

Formatting tables

When presenting tabular data (aka dplyr::summarize()), make sure you format it correctly. Use the kable() function from the knitr package to format the table for the final document. For instance, this is a poorly presented table summarizing where gun deaths occurred:

# calculate total gun deaths by location
#count(mass_shootings, location_type)

Instead, use kable() to format the table, add a caption, and label the columns:

# count(mass_shootings, location_type) %>%
#   kable(
#     caption = "Mass shootings in the United States by location",
#     col.names = c("Location", "Number of incidents")
#  )

Run ?kable in the console to see how additional options.

Submit the assignment

To submit the assignment, simply push to your repository the last version of your assignment before the deadline. Then copy your repository URL (e.g., https://github.com/css-fall22/hw2-brinasab) and submit it to Canvas under HW02 before the deadline.

Your assignment should be submitted as a R Markdown document .Rmd. Need a refresher on R Markdown? Read this or this.

Make sure to stage-commit-push: * mass-shootings.Rmd (the main file you will add your code to) * mass-shootings.md (you will generate this from the .Rmd by simply knitting it) * mass-shootings_files/ (this folder contains all the graphs you generated in your .Rmd)

As part of your submission, write 1-2 paragraphs of reflections on what was hard/easy about this homework, problems you solved and how, helpful resources you consulted, etc. Please, list the first and last name of eventual collaborators with whom you worked with to complete this assignment.

Rubric

Needs improvement: Displays minimal effort. Doesn’t complete all components. Code is poorly written and not documented. Uses the same type of plot for each graph, or doesn’t use plots appropriate for the variables being analyzed. Shows incomplete understanding of the packages needed for the assignment. No record of commits other than the final push to GitHub.

Satisfactory: Solid effort. Hits all the elements. Minor omissions but no clear mistakes. Easy to follow (both the code and the output). Shows sufficient understanding of the packages needed for the assignment.

Excellent: Finished all components of the assignment correctly. Code is well-documented (both self-documented and with additional comments as necessary). Graphs and tables are properly labeled. Uses multiple commits to back up and show a progression in the work. Analysis is clear and easy to follow, either because graphs are labeled clearly or you’ve written additional text to describe how you interpret the output. Shows solid understanding of the packages needed for the assignment.

For further details, see the general rubric we adopt for grading.