Chapter 1 Prerequisites

This document is to accompany An introduction to data wrangling with R tutorial for DH Downunder 2019 at the University of Newcastle, Australia, from 9-13 December.

I am a speech scientist working on cross-language lexical tone perception and production. I have rich experience dealing with experimental data and I am keen to help others with data wrangling, data visualization and statistical modelling problems. I aspire to promote a streamlined workflow with R packages to improve data analysis efficiency in quantitative analysis in the field of social science and linguistics.

If you have any questions about the tutorial, please e-mail me at: j.chen2@westernsydney.edu.au

Good data are somewhat alike but messy data are messy in different ways. This workshop aims to walk the audience through a streamlined workflow of data wrangling (importing data, cleaning data, transforming data) using popular R packages, such as dplyr and tidyr. It involves an introduction to basic concepts in data analysis, such as variables vs. observations, categorical vs. continuous variables, long vs. wide data. In addition, participants will learn how to (batch) import datasets, select and rename rows and columns, deal with missing data, generate new columns by computing the existing ones, and combine data frames. The pipe operator will be introduced to improve the efficiency and clarity of coding. Participants will also learn to write their own functions for data wrangling. Exercises and challenges involve real life research problems. Preliminary experience with R will be helpful, though not required. Participants are required to download and install R and R studio before the workshop. Datasets for the workshop are available online before the workshop. Participants are welcomed to bring their own data and apply what they learn on the spot.

Before we start our journey of data wrangling with R, you will need to install R on your laptop. R is multi-platform, which means you can install R on your PC or MAC.

1.1 Installing R and R studio

Use this link [https://cloud.r-project.org/] to download R and select the proper version for your laptop.
Download R

Figure 1.1: Download R

1.2 Installing libraries in R

RStudio is an integrated development environment, or IDE, for R programming. Download and install it from [http://www.rstudio.com/download.]

The free version is poweful enough.

1.3 Install packages and library packages

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation.

  • install.packages(“package_name”) Please type in:

install.packages(“tidyverse”)

tidyverse is a set of R packages for data science in R and it take a few minutes to download.

Once installed, they have to be loaded into the session to be used.

  • library(package_name)

Please type in

library(tidyverse)

If you have problems with installing R, please create an account for R studion clould.

https://rstudio.cloud/

and you can use this link to access the project

https://rstudio.cloud/project/782268

In this way you can follow the workshop on the cloud without having to download R locally.

1.4 Data and workbook

Please download the data and workbook with this link.

https://drive.google.com/open?id=18xwfD7f-eekQ60W-2IZ2wBkCX2GL08_u

百度网盘链接 https://pan.baidu.com/s/19lo6hL6UznT7np7m7K7igA

1.5 Worshop survey

Please let me know how do you feel about the workshop by doing this SHORT survey.

https://surveyswesternsydney.au1.qualtrics.com/jfe/form/SV_1C8fmEW2lRzMAF7