You can access the material used for this study here. In this online repository, we included all data and the script used to generate all analyses included both in the paper and for the revisions. This document presents these analyses in a more readable format.

1. Preliminaries

library(data.table)
library(tidyverse)
library(magrittr)
library(lme4)
library(lmerTest)
library(emmeans)
library(fda)
library(gridExtra)

load("data.RData")

The loaded .RData contains all data needed for the analyses and plots:

ls()
## [1] "ci.df"     "D.pcafd.e" "D.pcafd.o" "D_MZhigh"  "e.df"      "met.df"   
## [7] "o.df"

‘met.df’ is the main dataframe including data for both stem vowels and both deleted and realised suffix vowels.

‘ci.df’ is the dataframe that includes only tokens with phonetically realised suffix vowels.

‘e.df’ and ‘o.df’ are dataframes including data for stem-/e/ and stem-/o/ respectively.

‘D_MZhigh’ is a dataframe of mid and high stem vowel tokens produced by the speakers from the East.

‘D.pcafd.e’ and ‘D.pcafd.o’ are functional data objects necessary for plotting Principal Components curves.

The FPCA-based analysis follows the procedure exemplified in the scripts by M. Gubian available at this GitHub repository.

2. Acoustic analysis of stem vowels

2.1. Formant trajectory shapes (plots)

The following code lines refer to Figs. 4 and 5 showing the first three Principal Components for stem-/e/ and stem-/o/ separately. Each panel isolates the effect of one PC, say PCk, by displaying several colour-coded curves, each one obtained by substituting a different value of the corresponding score sk into equations (2a) and (2b) (see paper), setting all other scores to zero. The value sk = 0 corresponds to the mean curve across the entire data for that vowel (thick black lines), and is therefore the same across panels of the same stem vowel and formant.

Principal components for stem-/e/:

tx <- seq(0, 1, length.out = 35) 

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25))

e.df%>% setDT()

scores.sd.e <- e.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.e$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.e[PC] * 
                  D.pcafd.e$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.e$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")

Principal components for stem-/o/:

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25) 
)

o.df %>% setDT

scores.sd.o <- o.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.o$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.o[PC] * 
                  D.pcafd.o$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.o$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")