RStudio.Cloud Tour a Data Cleaning Workflow
- Due No Due Date
- Points 0
- Submitting a text entry box
Though Stata is still my primary tool for statistical analysis, I strongly prefer R for data cleaning and wrangling tasks. This lab is adapted from a workshop I taught to working professionals. It uses the Census Household Pulse published tables (not the microdata) available here. Links to an external site. Since you do not have a discussion board to share questions or problems you encounter in data cleaning that are not covered in this exercise, please provide questions/comments in the text entry on this assignment if applicable.
Although participants in the lab had a little lead up training in RStudio before this assignment, I start from a blank slate so you can see how I create the RMarkdown document and outline the workflow for a data cleaning routine in RStudio. It should be accessible even if you're brand new to RMarkdown and you can access your own copy of my workspace by joining our RStudio.Cloud group. To access the resources from the videos below, either sign into RStudio.Cloud (returning user) or use this link to join for the first time. Links to an external site.
As you watch you'll see me identify places where we need to accomplish the following tasks:
- Fix subheaders in rows
- Create new columns using formulas
- Wrangle, reshape, and join data across different worksheets
- Automate the entire process as a function to reproducibly clean the entire workbook in seconds
You can work on this same example with me as you watch and/or apply what you've learned from my example to your own Data Cleaning Workflow by uploading a file to the same workspace and beginning your own workflow. This video does NOT show you how to do all of the above.
To see this through the final steps, you can either view the Solutions.Rmd in the Canvas workspace or watch a recorded demonstration where I walk through this process during Live Project Day event I held in May 2021. Apologies for the video quality here - because it is live it's a little fuzzy. Please refer to Solutions.Rmd if needed as you watch the video so you can view the code more clearly.