A Introduction to Eploratory Data Analysis with R
2019-12-19
Chapter 1 Prerequisites
This document is to accompany Eploratory Data Analysis with R tutorial for DH Downunder 2019 at the University of Newcastle, Australia, from 9-13 December.
I am a speech scientist working on cross-language lexical tone perception and production. I have rich experience dealing with experimental data and I am keen to help others with data wrangling, data visualization and statistical modelling problems. I aspire to promote a streamlined workflow with R packages to improve data analysis efficiency in quantitative analysis in the field of social science and linguistics.
If you have any questions about the tutorial, please e-mail me at: j.chen2@westernsydney.edu.au
This workshop will show how to use data transformation and visualization to explore your data in a systematic way, or in a statistical term, exploratory data analysis. Participants will learn to generate questions about the data, search for answers by transforming, visualizing and modeling the dataset, and use what they learn to further refine the questions and/or generate new questions. The workshop will start by exploring variations in (categorical and continuous) one variable and move on to investigate covariations among two or three variables. Participants will learn to produce summary tables (calculating mean or standard deviation etc. of one or multiple variables by one or more variables) and will also learn to draw figures with ggplot2. This workshop builds on some knowledge of data wrangling. Therefore, it is desirable that participants should take the Introduction to data wrangling with R, if they have no such knowledge. Participants are welcomed to bring their own data and apply what they learn on the spot.
Before we start our journey of data wrangling with R, you will need to install R on your laptop. R is multi-platform, which means you can install R on your PC or MAC.
1.1 Installing R and R studio
Use this link [https://cloud.r-project.org/] to download R and select the proper version for your laptop.
knitr::include_graphics("img/installr.jpg")
RStudio is an integrated development environment, or IDE, for R programming. Download and install it from [http://www.rstudio.com/download.]
The free version is poweful enough.
1.2 Install packages and library packages
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation.
- install.packages(“package_name”) Please type in:
install.packages(“tidyverse”)
tidyverse is a set of R packages for data science in R and it takes a few minutes to download.
Once installed, they have to be loaded into the session to be used.
- library(package_name)
Please type in
library(tidyverse)
If you have problems with installing R, please create an account for R studion clould.
and you can use this link to access the project
https://rstudio.cloud/project/782390
In this way you can follow the workshop on the cloud without having to download R locally.
1.3 Data and workbook
Please download the data and workbook with this link.
https://drive.google.com/open?id=18xwfD7f-eekQ60W-2IZ2wBkCX2GL08_u
1.4 Worshop survey
Please let me know how do you feel about the workshop by doing this SHORT survey.
https://surveyswesternsydney.au1.qualtrics.com/jfe/form/SV_1C8fmEW2lRzMAF7