Chapter 7 Build your own function
Combining the pipeline with different functions offered by tidyverse, you can have chunks of codes that fit into your specific needs of data analysis. But you may want to use these chunks again and again in the script you write. This will result in very long script. It is too bad for your eyes!!
Thus we need to package these chunks into functions, so that every time we use them, we only need to refer to their names and put the dataframe in the parenthesis like we use other functions.
That sound very cool! But is that difficult?
noooooo! Not at all!
Basically you only need to know the function structure and think of a name for your function.
# function_name = function(input){
# do thing 1
# do thing 1
# return(output)
# }
# let's make a function for data cleaning
data_clean = function(input){
#copy and paste the chunk we wrote earlier
# change this line "df_final = df %>% to
input %>%
# selecting the columns we need
select(., subject = "Subject",
stimuli = "tone.Trial.",
response = "insex1.RESP",
response_rt = "insex1.RT",
block = "Procedure.Block.",
exp = ExperimentName)%>%
#filtering out useless data
filter(block != "pracproc" & !is.na(response) & response != "")%>%
# generating new variables based on old variables
mutate(ISI = str_extract(exp, "2000|500"),
block = recode(block,
block1 = "ss", block2 = "ss",
block3 = "sd", block4 = "sd",
block5 = "ds", block6 = "ds",
block7 = "dd")) -> output
return(output)
}
# done!!
Once you run the code, you will see a new function named data_clean appear in the environment tab on the right side of R-studio. This means a new function has been made.
Now we can use our new function.
library(tidyverse)
participant_1 <- read.csv("data/participant_1.csv")
participant_2 <- read.csv("data/participant_2.csv")
#head(participant_1)
#use the function
participant_1_clean = data_clean(participant_1)
head(participant_1_clean)
## subject stimuli response response_rt block exp ISI
## 1 402 33 f 672 ds Assim_main_chinese_500ms 500
## 2 402 315 j 2831 ds Assim_main_chinese_500ms 500
## 3 402 33 f 1041 ds Assim_main_chinese_500ms 500
## 4 402 315 j 363 ds Assim_main_chinese_500ms 500
## 5 402 21 f 1234 ds Assim_main_chinese_500ms 500
## 6 402 45 j 322 ds Assim_main_chinese_500ms 500
# do the same thing for participant_2
participant_2_clean = data_clean(participant_2)
head(participant_2_clean )
## subject stimuli response response_rt block exp
## 1 418 241 d 968 sd assim_main_vietnamese_500ms
## 2 418 315 h 1036 sd assim_main_vietnamese_500ms
## 3 418 241 g 1171 sd assim_main_vietnamese_500ms
## 4 418 45 j 610 sd assim_main_vietnamese_500ms
## 5 418 21 g 214 sd assim_main_vietnamese_500ms
## 6 418 45 j 1945 sd assim_main_vietnamese_500ms
## ISI
## 1 500
## 2 500
## 3 500
## 4 500
## 5 500
## 6 500
We can make funtions that meet our specific requirements and reuse them later. For example I want to generate a percentage of choice table for each participant. Since the requirements are too specific, I may not find a ready-to-use package with this function. Thus I can make one like this.
# we reuse the chunk of codes we just made
choice_table = function(input){
input%>%
group_by(stimuli,response)%>%
mutate(counter = 1)%>%
summarize(counter = sum(counter))%>%
mutate( percentage = round(counter/sum(counter),2),
sum = sum(counter))%>%
select(stimuli,percentage, response)%>%
spread(stimuli, value = percentage) -> output
# do not forget this line
return(output)
}
prt_2 = choice_table(participant_2_clean)
prt_2
## # A tibble: 4 x 6
## response `21` `33` `45` `241` `315`
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 d NA 0.11 NA 0.82 NA
## 2 g 1 0.89 NA 0.14 NA
## 3 h NA NA NA NA 0.96
## 4 j NA NA 1 0.04 0.04
Now the question is: do we need two function data_clean and choice_table?
The answer is that it depends on your data and your workflow. Sometime you can put the two functions together and get the results straight away. But you may want to have the clean data for other purposes. In that case, you may want to keep the two functions seperate.