library(data.table)
library(tidyverse)
library(magrittr)
library(lme4)
library(lmerTest)
library(emmeans)
library(fda)
library(gridExtra)
load("data.RData")
The loaded .RData contains all data needed for the analyses and plots:
ls()
## [1] "ci.df" "D.pcafd.e" "D.pcafd.o" "D_MZhigh" "e.df" "met.df"
## [7] "o.df"
‘met.df’ is the main dataframe including data for both stem vowels and both deleted and realised suffix vowels.
‘ci.df’ is the dataframe that includes only tokens with phonetically realised suffix vowels.
‘e.df’ and ‘o.df’ are dataframes including data for stem-/e/ and stem-/o/ respectively.
‘D_MZhigh’ is a dataframe of mid and high stem vowel tokens produced by the speakers from the East.
‘D.pcafd.e’ and ‘D.pcafd.o’ are functional data objects necessary for plotting Principal Components curves.
The FPCA-based analysis follows the procedure exemplified in the scripts by M. Gubian available at this GitHub repository.
The following code lines refer to Figs. 4 and 5 showing the first three Principal Components for stem-/e/ and stem-/o/ separately. Each panel isolates the effect of one PC, say PCk, by displaying several colour-coded curves, each one obtained by substituting a different value of the corresponding score sk into equations (2a) and (2b) (see paper), setting all other scores to zero. The value sk = 0 corresponds to the mean curve across the entire data for that vowel (thick black lines), and is therefore the same across panels of the same stem vowel and formant.
<- seq(0, 1, length.out = 35)
tx
<- CJ(time = tx,
curves PC = 1:3,
Formant = 1:2,
perturbation = seq(-1, 1, by=.25))
%>% setDT()
e.df
<- e.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric
scores.sd.e
%>%
curves := (D.pcafd.e$meanfd$coefs[, 1, Formant] +
.[, value * scores.sd.e[PC] *
perturbation $harmonics$coefs[, PC, Formant]) %>%
D.pcafd.efd(D.pcafd.e$meanfd$basis) %>%
eval.fd(tx, .),
= .(PC, Formant, perturbation)]
by
:= factor(Formant, levels = 2:1)] # make F2 appear on top
curves[, Formant <- as_labeller(function(x) paste0('PC', x))
PC_labeller <- as_labeller(function(x) paste0('F', x))
Formant_labeller ggplot(curves) +
aes(x = time, y = value, group = perturbation, color = perturbation) +
geom_line() +
scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
facet_grid(Formant ~ PC,
scales = "free_y",
labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
labs(color = expression(s[k]/sigma[k])) +
geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
xlab("Normalised time") +
ylab("Normalised frequency") +
theme_light() +
theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
text = element_text(size = 16), legend.position = "bottom")
<- CJ(time = tx,
curves PC = 1:3,
Formant = 1:2,
perturbation = seq(-1, 1, by=.25)
)
%>% setDT
o.df
<- o.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric
scores.sd.o
%>%
curves := (D.pcafd.o$meanfd$coefs[, 1, Formant] +
.[, value * scores.sd.o[PC] *
perturbation $harmonics$coefs[, PC, Formant]) %>%
D.pcafd.ofd(D.pcafd.o$meanfd$basis) %>%
eval.fd(tx, .),
= .(PC, Formant, perturbation)]
by
:= factor(Formant, levels = 2:1)] # make F2 appear on top
curves[, Formant <- as_labeller(function(x) paste0('PC', x))
PC_labeller <- as_labeller(function(x) paste0('F', x))
Formant_labeller ggplot(curves) +
aes(x = time, y = value, group = perturbation, color = perturbation) +
geom_line() +
scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
facet_grid(Formant ~ PC,
scales = "free_y",
labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
labs(color = expression(s[k]/sigma[k])) +
geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
xlab("Normalised time") +
ylab("Normalised frequency") +
theme_light() +
theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
text = element_text(size = 16), legend.position = "bottom")