This document is to accompany An introduction to data wrangling with R tutorial for DH Downunder 2019 at the University of Newcastle, Australia, from 9-13 December.

I am a speech scientist working on cross-language lexical tone perception and production. I have rich experience dealing with experimental data and I am keen to help others with data wrangling, data visualization and statistical modelling problems. I aspire to promote a streamlined workflow with R packages to improve data analysis efficiency in quantitative analysis in the field of social science and linguistics.

If you have any questions about the tutorial, please e-mail me at: j.chen2@westernsydney.edu.au

Good data are somewhat alike but messy data are messy in different ways. This workshop aims to walk the audience through a streamlined workflow of data wrangling (importing data, cleaning data, transforming data) using popular R packages, such as dplyr and tidyr. It involves an introduction to basic concepts in data analysis, such as variables vs. observations, categorical vs. continuous variables, long vs. wide data. In addition, participants will learn how to (batch) import datasets, select and rename rows and columns, deal with missing data, generate new columns by computing the existing ones, and combine data frames. The pipe operator will be introduced to improve the efficiency and clarity of coding. Participants will also learn to write their own functions for data wrangling. Exercises and challenges involve real life research problems. Preliminary experience with R will be helpful, though not required. Participants are required to download and install R and R studio before the workshop. Datasets for the workshop are available online before the workshop. Participants are welcomed to bring their own data and apply what they learn on the spot.

Here are the workshop materials.