You can access the material used for this study here. In this online repository, we included all data and the script used to generate all analyses included both in the paper and for the revisions. This document presents these analyses in a more readable format.

1. Preliminaries

library(data.table)
library(tidyverse)
library(magrittr)
library(lme4)
library(lmerTest)
library(emmeans)
library(fda)
library(gridExtra)

load("data.RData")

The loaded .RData contains all data needed for the analyses and plots:

ls()
## [1] "ci.df"     "D.pcafd.e" "D.pcafd.o" "D_MZhigh"  "e.df"      "met.df"   
## [7] "o.df"

‘met.df’ is the main dataframe including data for both stem vowels and both deleted and realised suffix vowels.

‘ci.df’ is the dataframe that includes only tokens with phonetically realised suffix vowels.

‘e.df’ and ‘o.df’ are dataframes including data for stem-/e/ and stem-/o/ respectively.

‘D_MZhigh’ is a dataframe of mid and high stem vowel tokens produced by the speakers from the East.

‘D.pcafd.e’ and ‘D.pcafd.o’ are functional data objects necessary for plotting Principal Components curves.

The FPCA-based analysis follows the procedure exemplified in the scripts by M. Gubian available at this GitHub repository.

2. Acoustic analysis of stem vowels

2.1. Formant trajectory shapes (plots)

The following code lines refer to Figs. 4 and 5 showing the first three Principal Components for stem-/e/ and stem-/o/ separately. Each panel isolates the effect of one PC, say PCk, by displaying several colour-coded curves, each one obtained by substituting a different value of the corresponding score sk into equations (2a) and (2b) (see paper), setting all other scores to zero. The value sk = 0 corresponds to the mean curve across the entire data for that vowel (thick black lines), and is therefore the same across panels of the same stem vowel and formant.

Principal components for stem-/e/:

tx <- seq(0, 1, length.out = 35) 

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25))

e.df%>% setDT()

scores.sd.e <- e.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.e$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.e[PC] * 
                  D.pcafd.e$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.e$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")

Principal components for stem-/o/:

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25) 
)

o.df %>% setDT

scores.sd.o <- o.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.o$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.o[PC] * 
                  D.pcafd.o$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.o$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")

For both stem vowels, PC1 is associated with simultaneous variations in phonetic height and frontness/backness either between high front and low-mid front in the case of /e/, or between high back and low-mid back in the case of /o/. The phonetic interpretation of PC2, instead, might be related to a variation in lip rounding for /e/ and in phonetic backness for /o/. PC3 for both /e/ and /o/ encode variations between phonetically closing and opening diphthongs.

2.2. Regional variation

Below you can find the code used to generate the violin plots shown in Figs. 5, 6, 8, 9. These show the distribution of PC-score values separately by region and suffix vowel.

PC-score 1 (s1), stem-/e/

Higher s1 values are associated with increasing vowel lowering, while lower s1 values correspond to increasing vowel raising.

cols.curves = c("red", "darkgrey", "orange","darkgreen")

ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s1, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[1]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 1 (s1), stem-/o/:

ggplot(o.df %>% filter (s1 < 2.9) %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s1, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[1]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 3 (s3), stem-/e/:

Higher s3 values suggest greater opening diphthongisation.

ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s3, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[3]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 3 (s3), stem-/o/:

ggplot(o.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s3, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[3]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

Statistics

The following LMER models analyse s1, s2 (not analysed in the paper), and s3 in stem-/e/ data.

The results show a significant influence on s1 of the suffix vowel, of region, and a significant interaction between these factors. The results of the mixed model with s3 as the dependent variable show a significant influence of the suffix, a not quite significant influence of region, and a significant interaction between these factors.

m.e <- list()
m.e[[1]] <- lmer(s1 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))

m.e[[2]] <- lmer(s2 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
m.e[[3]] <- lmer(s3 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
# F-statistics: 

anova(m.e[[1]])
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
## Suffix_vowel        71.085 23.6951     3 2493.58 122.1552 < 2.2e-16 ***
## Region               3.786  1.8928     2   49.07   9.7577 0.0002705 ***
## Suffix_vowel:Region 41.135  6.8559     6 1120.75  35.3441 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.e[[3]])
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.87698 0.29233     3 2632.7 11.9000 9.708e-08 ***
## Region              0.15161 0.07581     2   48.8  3.0859   0.05471 .  
## Suffix_vowel:Region 2.16552 0.36092     6 2068.1 14.6922 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/e/: s1, differences between regions

The post-hoc tests show significant differences between all pairs of regions for suffix-/i/and for suffix-/u/, but no differences between the regions for suffix-/a/, and only one pairwise difference (MM vs West) for suffix-/e/.

m1.e <- m.e[[1]]

emmeans(m1.e, pairwise ~ Region | Suffix_vowel)$contrasts
## Suffix_vowel = e:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.2330 0.0949 74.9   2.456  0.0428
##  MM - East     0.1615 0.1151 57.2   1.403  0.3462
##  West - East  -0.0715 0.1307 57.6  -0.547  0.8483
## 
## Suffix_vowel = a:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.0835 0.0999 88.9   0.836  0.6819
##  MM - East    -0.1024 0.1185 63.8  -0.864  0.6647
##  West - East  -0.1859 0.1348 64.6  -1.379  0.3576
## 
## Suffix_vowel = u:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3388 0.0940 73.2   3.604  0.0016
##  MM - East     0.7975 0.1133 53.8   7.039  <.0001
##  West - East   0.4587 0.1298 55.5   3.535  0.0024
## 
## Suffix_vowel = i:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3379 0.0902 63.3   3.748  0.0011
##  MM - East     0.7545 0.1122 51.8   6.728  <.0001
##  West - East   0.4167 0.1272 51.5   3.276  0.0053
## 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests for stem-/e/: s3, differences between regions

The post-hoc tests show a significant difference between the West and the other two regions for suffix-/i/ and for suffix-/u/. There were no significant differences between any of the regions for suffixes-/e, a/.

m3.e <- m.e[[3]]

emmeans(m3.e, pairwise ~ Region | Suffix_vowel)$contrasts
## Suffix_vowel = e:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.064126 0.0629 63.9  -1.020  0.5672
##  MM - East    0.030996 0.0466 45.5   0.665  0.7846
##  West - East  0.095122 0.0606 64.3   1.570  0.2657
## 
## Suffix_vowel = a:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West    0.023828 0.0641 68.7   0.372  0.9267
##  MM - East    0.024626 0.0476 49.4   0.518  0.8632
##  West - East  0.000799 0.0618 69.2   0.013  0.9999
## 
## Suffix_vowel = u:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.241524 0.0627 62.9  -3.854  0.0008
##  MM - East    0.051250 0.0460 43.4   1.113  0.5111
##  West - East  0.292774 0.0603 63.0   4.858  <.0001
## 
## Suffix_vowel = i:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.172545 0.0618 59.7  -2.791  0.0190
##  MM - East    0.010344 0.0457 42.2   0.226  0.9722
##  West - East  0.182889 0.0595 60.0   3.073  0.0088
## 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 3 estimates

The following LMER models analyse s1, s2 (not analysed in the paper), and s3 in stem-/o/ data.

The results show for both s1 and s3 a significant influence of the suffix vowel, of region, and a significant interaction between these factors.

m.o <- list()
m.o[[1]] <- lmer(s1 ~  Suffix_vowel * Region +
                   (1 + Region|Stem) + (1 |speaker),
                 data = o.df, 
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))

m.o[[2]] <- lmer(s2 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
m.o[[3]] <- lmer(s3 ~  Suffix_vowel * Region +
                   (1 + Region|Stem) + (1 |speaker),
                 data = o.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
# F-statistics: 

anova(m.o[[1]])
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
## Suffix_vowel        168.874  56.291     3 1995.62 253.6305 < 2.2e-16 ***
## Region                2.819   1.410     2   44.22   6.3509  0.003762 ** 
## Suffix_vowel:Region  50.241   8.374     6  889.03  37.7285 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.o[[3]])
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.0308 0.34360     3 2504.73 17.9279 1.642e-11 ***
## Region              0.1937 0.09685     2   42.72  5.0533   0.01072 *  
## Suffix_vowel:Region 2.0332 0.33886     6  465.58 17.6807 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/o/: s1, differences between regions

The results show a significant difference between MM and the East in the context of all four suffix vowels. There were significant differences between MM and the West in three suffix vowel contexts but not in /a/. There were differences between the West and the East in the context of suffix-/u/ and suffix-/a/ but not in the context of the other two suffix vowels.

m1.o <- m.o[[1]]

emmeans(m1.o, pairwise ~ Region | Suffix_vowel)$contrasts
## Suffix_vowel = e:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3531 0.0998 73.5   3.539  0.0020
##  MM - East     0.3053 0.1074 85.9   2.843  0.0152
##  West - East  -0.0478 0.1116 95.4  -0.428  0.9041
## 
## Suffix_vowel = a:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.0386 0.0932 55.8   0.415  0.9098
##  MM - East    -0.2436 0.0996 65.2  -2.447  0.0446
##  West - East  -0.2822 0.1028 72.4  -2.746  0.0205
## 
## Suffix_vowel = u:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.2574 0.0916 52.2   2.811  0.0187
##  MM - East     0.5651 0.0979 61.2   5.771  <.0001
##  West - East   0.3077 0.1010 67.9   3.048  0.0091
## 
## Suffix_vowel = i:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3552 0.0939 57.8   3.784  0.0011
##  MM - East     0.4742 0.1011 68.7   4.692  <.0001
##  West - East   0.1190 0.1037 74.4   1.147  0.4886
## 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests for stem-/o/: s3, differences between regions

The results show significant differences between the West and the other two regions in the context of suffix-/i/ and suffix-/u/, but not for the other two suffix vowel contexts. There were no significant differences between MM and the West in any contexts.

m3.o = m.o[[3]]

emmeans(m3.o, pairwise ~ Region | Suffix_vowel)$contrasts
## Suffix_vowel = e:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West   -0.06560 0.0318 90.5  -2.060  0.1040
##  MM - East    0.04753 0.0262 83.1   1.815  0.1710
##  West - East  0.11313 0.0388 69.4   2.915  0.0131
## 
## Suffix_vowel = a:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West    0.03459 0.0292 70.5   1.183  0.4674
##  MM - East    0.00791 0.0238 60.8   0.332  0.9411
##  West - East -0.02667 0.0358 50.9  -0.746  0.7376
## 
## Suffix_vowel = u:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West   -0.16418 0.0287 66.5  -5.726  <.0001
##  MM - East   -0.01035 0.0233 56.2  -0.444  0.8974
##  West - East  0.15382 0.0352 47.8   4.375  0.0002
## 
## Suffix_vowel = i:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West   -0.10665 0.0296 73.3  -3.603  0.0016
##  MM - East    0.04783 0.0243 64.2   1.969  0.1283
##  West - East  0.15448 0.0362 53.4   4.272  0.0002
## 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 3 estimates

Reconstructed formants from emmeans (Figs. 7 and 10)

The results presented above show that suffix vowels influenced the phonetic height of stem vowels. This effect is most clearly seen in the reconstructed formant plots for F1 in MM and the East in Figs. 7 and 10, in which the stem vowels – especially for the East – had a progressively higher F1 in the context of suffix vowels.

Stem-/e/

Preparatory data for /e/-reconstruction:

emm.e <- lapply(m.e, function(m_) {
  emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>%
    as.data.table
})

METSuffVowel <- emm.e[[1]][, .(Suffix_vowel)] %>% unique

# join it to all tables in emm.e
lapply(emm.e, function(emm_) {emm_[METSuffVowel,
                                   on = "Suffix_vowel",
                                   Suffix_vowel := i.Suffix_vowel]})
## [[1]]
##     Suffix_vowel Region      emmean         SE       df    lower.CL    upper.CL
##  1:            e     MM  0.27107290 0.08765786 58.38737  0.09563142  0.44651438
##  2:            a     MM  0.39999056 0.08990493 64.28029  0.22039990  0.57958122
##  3:            u     MM  0.19614030 0.08713844 56.98417  0.02164754  0.37063306
##  4:            i     MM  0.12837604 0.08601750 54.28678 -0.04405779  0.30080987
##  5:            e   West  0.03803987 0.09428073 67.44968 -0.15012219  0.22620193
##  6:            a   West  0.31650274 0.09778796 76.76066  0.12177234  0.51123313
##  7:            u   West -0.14265457 0.09401509 67.07912 -0.33030541  0.04499627
##  8:            i   West -0.20949639 0.09086488 59.20038 -0.39130371 -0.02768906
##  9:            e   East  0.10957609 0.12295982 34.82171 -0.14009132  0.35924349
## 10:            a   East  0.50242133 0.12456308 36.63796  0.24994834  0.75489433
## 11:            u   East -0.60134577 0.12219746 33.94505 -0.84969570 -0.35299584
## 12:            i   East -0.62617307 0.12186903 33.59317 -0.87395125 -0.37839488
## 
## [[2]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM -0.007564794 0.06457961 59.34967 -0.136772367
##  2:            a     MM -0.005318088 0.06582758 63.88305 -0.136828438
##  3:            u     MM -0.079284466 0.06430372 58.31882 -0.207987371
##  4:            i     MM  0.008804732 0.06366645 56.14950 -0.118726999
##  5:            e   West  0.065658119 0.05727243 70.82907 -0.048544641
##  6:            a   West  0.096894564 0.05956377 81.18601 -0.021614544
##  7:            u   West -0.041372364 0.05702698 70.24094 -0.155102232
##  8:            i   West  0.124161930 0.05498142 61.47028  0.014236800
##  9:            e   East -0.039033083 0.05107613 31.00730 -0.143202546
## 10:            a   East  0.107544881 0.05240172 34.26432  0.001082016
## 11:            u   East -0.130047894 0.05035181 29.20124 -0.232998084
## 12:            i   East -0.049532444 0.05008367 28.61506 -0.152024929
##        upper.CL
##  1:  0.12164278
##  2:  0.12619226
##  3:  0.04941844
##  4:  0.13633646
##  5:  0.17986088
##  6:  0.21540367
##  7:  0.07235750
##  8:  0.23408706
##  9:  0.06513638
## 10:  0.21400775
## 11: -0.02709770
## 12:  0.05296004
## 
## [[3]]
##     Suffix_vowel Region       emmean         SE       df    lower.CL
##  1:            e     MM -0.012748327 0.03885466 57.38445 -0.09054214
##  2:            a     MM -0.008875929 0.03950104 61.15779 -0.08785901
##  3:            u     MM -0.027134288 0.03870493 56.49105 -0.10465470
##  4:            i     MM -0.007483216 0.03838218 54.71451 -0.08441185
##  5:            e   West  0.051377269 0.05427919 59.02585 -0.05723415
##  6:            a   West -0.032703569 0.05518453 62.90279 -0.14298439
##  7:            u   West  0.214389721 0.05417656 58.57174  0.10596608
##  8:            i   West  0.165061880 0.05342256 55.45074  0.05802022
##  9:            e   East -0.043744542 0.03974888 62.82396 -0.12318069
## 10:            a   East -0.033502211 0.04033936 66.48947 -0.11403127
## 11:            u   East -0.078383925 0.03945129 60.93106 -0.15727346
## 12:            i   East -0.017827177 0.03933185 60.23271 -0.09649633
##         upper.CL
##  1: 0.0650454898
##  2: 0.0701071557
##  3: 0.0503861251
##  4: 0.0694454137
##  5: 0.1599886895
##  6: 0.0775772478
##  7: 0.3228133634
##  8: 0.2721035345
##  9: 0.0356916038
## 10: 0.0470268455
## 11: 0.0005056062
## 12: 0.0608419776
tx.e <- seq(0, 1, length.out = 35)
curves.e <- CJ(time = tx.e,
               Formant = 1:2,
               Vowel_ =  c("a", "e", "i", "u"),
               Region_ = c("MM", "East", "West")
)
curves.e[, value := (D.pcafd.e$meanfd$coefs[,1,Formant] +
                       sapply(c(1, 3), function(PC) {
                         (emm.e[[PC]] %>%
                            .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>%
                            as.numeric) *
                           D.pcafd.e$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>%
           fd(., D.pcafd.e$meanfd$basis) %>%
           eval.fd(tx.e, .),
         by = .(Formant, Vowel_, Region_)]

curves.e[, Region_ := factor (Region_, levels = c("MM", "West", "East"))]
# change 1, 2 to F1, F2
curves.e[, Formant := paste0("F", Formant)]
# order formants F2, F1, so that F2 is on top in panels (optional)
curves.e[, Formant := factor(Formant, levels = c("F2", "F1"))]
# order vowels e i a u to make use of Paired palette,
# i.e. darker = metaphony, base color = backness (optional)
curves.e[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))]

Plot of reconstructed stem-/e/ formants:

cols.curves = c("red", "darkgrey", "orange", "darkgreen")

ggplot(curves.e) +
  aes(x = time,  col = Vowel_, group = Vowel_) +
  geom_line(aes(y = value), size=1.3) +
  facet_grid(Formant ~ Region_) +
  scale_colour_manual(values = cols.curves) +
  theme_light()+
  theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
        axis.title.y = element_text(size=14), text = element_text(size=16),  
        legend.title=element_blank(), legend.position = "top") +
  xlab("Normalised time") +
  ylab("Normalised frequency")

Stem-/e/

Preparatory data for /o/-reconstruction:

emm.o <- lapply(m.o, function(m_) {
  emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>%
    as.data.table
})

METSuffVowel <- emm.o[[1]][, .(Suffix_vowel)] %>% unique

# join it to all tables in emm.e
lapply(emm.o, function(emm_) {emm_[METSuffVowel,
                                   on = "Suffix_vowel",
                                   Suffix_vowel := i.Suffix_vowel]})
## [[1]]
##     Suffix_vowel Region      emmean         SE        df   lower.CL    upper.CL
##  1:            e     MM  0.26234992 0.08015804  79.18263  0.1028051  0.42189478
##  2:            a     MM  0.40039930 0.07639553  66.70325  0.2479007  0.55289789
##  3:            u     MM -0.01732291 0.07530550  63.35683 -0.1677924  0.13314654
##  4:            i     MM  0.10319980 0.07696596  68.90664 -0.0503468  0.25674640
##  5:            e   West -0.09070773 0.08791436 101.10362 -0.2651040  0.08368854
##  6:            a   West  0.36177541 0.08195117  79.28798  0.1986649  0.52488591
##  7:            u   West -0.27474320 0.08055480  74.86802 -0.4352212 -0.11426517
##  8:            i   West -0.25202974 0.08225470  80.76833 -0.4156979 -0.08836157
##  9:            e   East -0.04292811 0.09829713  56.53952 -0.2397995  0.15394324
## 10:            a   East  0.64400541 0.09327655  45.93938  0.4562428  0.83176805
## 11:            u   East -0.58241246 0.09236719  44.13573 -0.7685501 -0.39627478
## 12:            i   East -0.37099514 0.09393386  47.17095 -0.5599476 -0.18204267
## 
## [[2]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM -0.007564794 0.06457961 59.34967 -0.136772367
##  2:            a     MM -0.005318088 0.06582758 63.88305 -0.136828438
##  3:            u     MM -0.079284466 0.06430372 58.31882 -0.207987371
##  4:            i     MM  0.008804732 0.06366645 56.14950 -0.118726999
##  5:            e   West  0.065658119 0.05727243 70.82907 -0.048544641
##  6:            a   West  0.096894564 0.05956377 81.18601 -0.021614544
##  7:            u   West -0.041372364 0.05702698 70.24094 -0.155102232
##  8:            i   West  0.124161930 0.05498142 61.47028  0.014236800
##  9:            e   East -0.039033083 0.05107613 31.00730 -0.143202546
## 10:            a   East  0.107544881 0.05240172 34.26432  0.001082016
## 11:            u   East -0.130047894 0.05035181 29.20124 -0.232998084
## 12:            i   East -0.049532444 0.05008367 28.61506 -0.152024929
##        upper.CL
##  1:  0.12164278
##  2:  0.12619226
##  3:  0.04941844
##  4:  0.13633646
##  5:  0.17986088
##  6:  0.21540367
##  7:  0.07235750
##  8:  0.23408706
##  9:  0.06513638
## 10:  0.21400775
## 11: -0.02709770
## 12:  0.05296004
## 
## [[3]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM  0.004680716 0.02900527 52.06666 -0.053520836
##  2:            a     MM -0.014871485 0.02819251 46.71951 -0.071596485
##  3:            u     MM -0.026498067 0.02796608 45.29874 -0.082814389
##  4:            i     MM  0.001042657 0.02830865 47.52519 -0.055890383
##  5:            e   West  0.070279037 0.03420223 68.92516  0.002046129
##  6:            a   West -0.049457514 0.03214423 54.84021 -0.113880210
##  7:            u   West  0.137677963 0.03172375 52.25253  0.074026918
##  8:            i   West  0.107690021 0.03239696 56.51250  0.042804088
##  9:            e   East -0.042847944 0.03402426 40.55186 -0.111584393
## 10:            a   East -0.022785194 0.03286399 35.33781 -0.089479859
## 11:            u   East -0.016146517 0.03264080 34.38916 -0.082452961
## 12:            i   East -0.046788874 0.03298057 35.85085 -0.113686235
##       upper.CL
##  1: 0.06288227
##  2: 0.04185351
##  3: 0.02981826
##  4: 0.05797570
##  5: 0.13851194
##  6: 0.01496518
##  7: 0.20132901
##  8: 0.17257596
##  9: 0.02588850
## 10: 0.04390947
## 11: 0.05015993
## 12: 0.02010849
tx.o <- seq(0, 1, length.out = 35)
curves.o <- CJ(time = tx.o,
               Formant = 1:2,
               Vowel_ =  c("a", "e", "i", "u"),
               Region_ = c("MM", "East", "West")
)
curves.o[, value := (D.pcafd.o$meanfd$coefs[,1,Formant] +
                       sapply(c(1, 3), function(PC) {
                         (emm.o[[PC]] %>%
                            .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>%
                            as.numeric) *
                           D.pcafd.o$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>%
           fd(., D.pcafd.o$meanfd$basis) %>%
           eval.fd(tx.o, .),
         by = .(Formant, Vowel_, Region_)]

curves.o[, Region_ := factor (Region_, levels = c("MM", "West", "East"))]
# change 1, 2 to F1, F2
curves.o[, Formant := paste0("F", Formant)]
# order formants F2, F1, so that F2 is on top in panels (optional)
curves.o[, Formant := factor(Formant, levels = c("F2", "F1"))]
# order vowels e i a u to make use of Paired palette,
curves.o[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))]

Plot of reconstructed stem-/o/ formants:

cols.curves = c("red", "darkgrey", "orange", "darkgreen")

ggplot(curves.o) +
  aes(x = time,  col = Vowel_, group = Vowel_) +
  geom_line(aes(y = value), size=1.3) +
  facet_grid(Formant ~ Region_) +
  scale_colour_manual(values = cols.curves) +
  theme_light()+
  theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
        axis.title.y = element_text(size=14), text = element_text(size=16),  
        legend.title=element_blank(), legend.position = "top") +
  xlab("Normalised time") +
  ylab("Normalised frequency")

3. Analysis of suffix erosion

3.1. Suffix deletion

Plot of proportions of deleted vs realised suffix vowels, separately by region, stem vowel, and suffix vowel (Fig. 12):

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")
cols = c("black", "lightblue")
ggplot(met.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + 
  aes(fill = Suffix, x = Region) +
  geom_bar(position="fill") + 
  facet_grid(Suffix_vowel ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

This shows that the extent of suffix deletion was greater in the East, least for MM, and with the West between the two.

Statistics

This is the GLMER model testing for significance of differences observed in the plot above. An extra column, suffix_del, is defined carrying suffix deletion as a logical variable (two levels: True/False).

suffix.glmer = glmer(suffix_del ~ Region + Suffix_vowel + Stem_vowel + (1 | speaker) +  
                       (0 + Region | Stem) + (1|Stem) + Region:Suffix_vowel + 
                       Region:Stem_vowel, family = binomial, 
                     control=glmerControl(optimizer="bobyqa"),
                     data = met.df %>% mutate(suffix_del = is.na(ci)))

# F-statistics: 

joint_tests(suffix.glmer) 
##  model term          df1 df2 F.ratio  Chisq p.value
##  Region                2 Inf   6.497 12.994  0.0015
##  Suffix_vowel          3 Inf   4.238 12.714  0.0053
##  Stem_vowel            1 Inf   0.893  0.893  0.3448
##  Region:Suffix_vowel   6 Inf   1.982 11.892  0.0645
##  Region:Stem_vowel     2 Inf   5.668 11.336  0.0035

The GLMM analysis confirms that the degree of suffix deletion was significantly influenced by both region and suffix vowel.

Post-hoc tests: differences between suffixes

The only region showing differences between suffix vowels is the West, which shows a greater deletion for /i, u/ than for /a/ suffix vowels.

emmeans(suffix.glmer, pairwise ~ Suffix_vowel | Region)$contrasts
## Region = MM:
##  contrast estimate    SE  df z.ratio p.value
##  e - a      0.5711 0.455 Inf   1.256  0.5912
##  e - u     -0.2260 0.398 Inf  -0.568  0.9415
##  e - i     -0.3247 0.361 Inf  -0.900  0.8048
##  a - u     -0.7971 0.376 Inf  -2.117  0.1476
##  a - i     -0.8958 0.409 Inf  -2.191  0.1257
##  u - i     -0.0987 0.319 Inf  -0.309  0.9898
## 
## Region = West:
##  contrast estimate    SE  df z.ratio p.value
##  e - a      0.8693 0.460 Inf   1.891  0.2318
##  e - u     -0.2858 0.401 Inf  -0.712  0.8923
##  e - i     -0.3539 0.358 Inf  -0.988  0.7564
##  a - u     -1.1552 0.385 Inf  -2.998  0.0144
##  a - i     -1.2232 0.407 Inf  -3.003  0.0142
##  u - i     -0.0680 0.315 Inf  -0.216  0.9964
## 
## Region = East:
##  contrast estimate    SE  df z.ratio p.value
##  e - a     -0.0318 0.221 Inf  -0.144  0.9989
##  e - u      0.0608 0.222 Inf   0.274  0.9928
##  e - i     -0.0870 0.201 Inf  -0.433  0.9728
##  a - u      0.0926 0.182 Inf   0.508  0.9573
##  a - i     -0.0552 0.214 Inf  -0.258  0.9940
##  u - i     -0.1478 0.185 Inf  -0.798  0.8555
## 
## Results are averaged over the levels of: Stem_vowel 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

Post-hoc tests: differences between regions

These show that MM–East contrasts were significant for all stem-suffix vowel combinations. Conversely, contrasts between either MM and the West or the West and the East were only sporadically significant.

emmeans(suffix.glmer, pairwise ~ Region | Stem_vowel * Suffix_vowel)$contrasts
## Stem_vowel = e, Suffix_vowel = e:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.851423 0.802 Inf  -1.062  0.5376
##  MM - East   -2.429193 0.742 Inf  -3.274  0.0030
##  West - East -1.577770 0.753 Inf  -2.096  0.0907
## 
## Stem_vowel = o, Suffix_vowel = e:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.854860 0.787 Inf  -2.356  0.0485
##  MM - East   -2.202240 0.747 Inf  -2.947  0.0090
##  West - East -0.347380 0.743 Inf  -0.467  0.8866
## 
## Stem_vowel = e, Suffix_vowel = a:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.553199 0.856 Inf  -0.646  0.7946
##  MM - East   -3.032131 0.768 Inf  -3.949  0.0002
##  West - East -2.478932 0.784 Inf  -3.161  0.0045
## 
## Stem_vowel = o, Suffix_vowel = a:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.556636 0.819 Inf  -1.900  0.1386
##  MM - East   -2.805178 0.751 Inf  -3.736  0.0005
##  West - East -1.248541 0.752 Inf  -1.660  0.2206
## 
## Stem_vowel = e, Suffix_vowel = u:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.911242 0.778 Inf  -1.171  0.4708
##  MM - East   -2.142405 0.722 Inf  -2.968  0.0084
##  West - East -1.231162 0.736 Inf  -1.672  0.2161
## 
## Stem_vowel = o, Suffix_vowel = u:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.914679 0.747 Inf  -2.562  0.0281
##  MM - East   -1.915451 0.709 Inf  -2.700  0.0190
##  West - East -0.000772 0.710 Inf  -0.001  1.0000
## 
## Stem_vowel = e, Suffix_vowel = i:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.880592 0.762 Inf  -1.156  0.4795
##  MM - East   -2.191537 0.711 Inf  -3.081  0.0058
##  West - East -1.310944 0.725 Inf  -1.807  0.1672
## 
## Stem_vowel = o, Suffix_vowel = i:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.884029 0.744 Inf  -2.531  0.0306
##  MM - East   -1.964583 0.713 Inf  -2.756  0.0161
##  West - East -0.080554 0.712 Inf  -0.113  0.9930
## 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates

3.2. Suffix centralisation

Violin plots showing centralisation (c) index values (“ci” in the script) for suffix vowels, separately for the three regions, stem vowel, and suffix vowel type (Fig. 13). Higher values and values around 0 are indicative of greater centralisation.

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols=c("brown1", "chartreuse3", "deepskyblue3")
legend_title <- "Region"
ggplot(ci.df %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(y = ci, x = Region, fill=Region) + geom_violin(trim=F)+
  facet_grid(Stem_vowel ~ Suffix_vowel, 
             labeller=labeller(Stem_vowel=Stem_vowel.labs))+
  ylab("c")+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Region")

Statistics

The results of the mixed model showed significant influences on the centralisation index of the region and of the suffix vowel. There was also a significant interaction between these two fixed factors, and between region, stem vowel, and suffix vowel.

ci.lmer = lmer(ci ~ Region * Stem_vowel * Suffix_vowel + (1|Stem) + (1|speaker), data = ci.df)

# F-statistics:
anova(ci.lmer)
## Type III Analysis of Variance Table with Satterthwaite's method
##                                 Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)
## Region                         13.1195  6.5597     2   32.3 20.8807 1.519e-06
## Stem_vowel                      0.0141  0.0141     1   47.3  0.0450    0.8330
## Suffix_vowel                   26.4046  8.8015     3 3124.1 28.0167 < 2.2e-16
## Region:Stem_vowel               0.3193  0.1597     2 4618.2  0.5082    0.6016
## Region:Suffix_vowel            20.2697  3.3783     6 4590.5 10.7536 6.584e-12
## Stem_vowel:Suffix_vowel         1.9632  0.6544     3 3124.8  2.0831    0.1003
## Region:Stem_vowel:Suffix_vowel 10.2801  1.7134     6 4590.6  5.4539 1.240e-05
##                                   
## Region                         ***
## Stem_vowel                        
## Suffix_vowel                   ***
## Region:Stem_vowel                 
## Region:Suffix_vowel            ***
## Stem_vowel:Suffix_vowel           
## Region:Stem_vowel:Suffix_vowel ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests: differences between regions

The results show that there was significantly greater centralisation in the suffix vowel for the East than MM for all stem-suffix vowel combinations. There was also greater suffix vowel centralisation for the West than MM for /i, a/-suffixes and for /e/-suffixes preceded by /o/-stems. The extent of suffix vowel centralisation was also greater for the East than the West for /i, u, a/-suffixes.

emm_options(pbkrtest.limit = nrow(ci.df),lmerTest.limit = nrow(ci.df))

emmeans(ci.lmer, pairwise ~ Region | Suffix_vowel * Stem_vowel)$contrasts
## Suffix_vowel = e, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.309 0.161 43.2  -1.926  0.1437
##  MM - East     -0.536 0.146 40.5  -3.676  0.0019
##  West - East   -0.226 0.154 45.1  -1.471  0.3142
## 
## Suffix_vowel = a, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.587 0.164 47.3  -3.575  0.0023
##  MM - East     -1.104 0.149 44.5  -7.402  <.0001
##  West - East   -0.517 0.158 50.1  -3.273  0.0054
## 
## Suffix_vowel = u, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.275 0.161 43.6  -1.710  0.2129
##  MM - East     -1.029 0.144 39.0  -7.123  <.0001
##  West - East   -0.753 0.154 45.0  -4.896  <.0001
## 
## Suffix_vowel = i, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.518 0.158 40.6  -3.277  0.0060
##  MM - East     -0.935 0.143 37.9  -6.518  <.0001
##  West - East   -0.416 0.151 42.1  -2.753  0.0231
## 
## Suffix_vowel = e, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.440 0.167 51.0  -2.630  0.0298
##  MM - East     -0.819 0.150 45.8  -5.452  <.0001
##  West - East   -0.379 0.162 54.8  -2.345  0.0580
## 
## Suffix_vowel = a, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.506 0.160 42.8  -3.159  0.0080
##  MM - East     -0.982 0.145 39.4  -6.784  <.0001
##  West - East   -0.475 0.153 44.3  -3.103  0.0092
## 
## Suffix_vowel = u, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.208 0.160 42.5  -1.301  0.4027
##  MM - East     -0.725 0.144 38.1  -5.049  <.0001
##  West - East   -0.517 0.153 43.7  -3.385  0.0042
## 
## Suffix_vowel = i, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.455 0.163 45.6  -2.796  0.0202
##  MM - East     -0.923 0.146 40.5  -6.333  <.0001
##  West - East   -0.468 0.156 47.2  -3.004  0.0116
## 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests: differences between between suffixes

Across the three regions, the suffix vowel /e/ was more centralised than /a, i, u/-suffixes.

emmeans(ci.lmer, pairwise ~ Suffix_vowel)$contrasts
##  contrast estimate     SE   df t.ratio p.value
##  e - a     0.21991 0.0314 3194   7.002  <.0001
##  e - u     0.22733 0.0324 2413   7.012  <.0001
##  e - i     0.24833 0.0293 3663   8.478  <.0001
##  a - u     0.00742 0.0288 3346   0.258  0.9940
##  a - i     0.02842 0.0313 2562   0.907  0.8010
##  u - i     0.02100 0.0264 3950   0.795  0.8566
## 
## Results are averaged over the levels of: Region, Stem_vowel 
## Degrees-of-freedom method: satterthwaite 
## P value adjustment: tukey method for comparing a family of 4 estimates

3.3. Correlation between suffix centralisation and metaphony within regions

This analysis needs a separate dataframe. The following code lines explain step-by-step how this was created.

First, we rename “u” in Suffix_vowel as “i” - this is because there are too few tokens for distances between stems with suffix-/e/ and suffix-/u/ to be calculated. “i” stands therefore for a high vowel (either “i” or “u”).

h = as.character(ci.df$Suffix_vowel)
h[h == "u"] = "i"
ci.df$Suffix_vowel = factor(h)

Then we calculate the mean s1, mean s3, mean F1n (“n” = normalised), and mean F2n for each speaker-Stem-Suffix vowel combination. This is \(\bar{x}\)s.w.k and \(\bar{y}\)s.w.k in equation (7) in the paper.

word.df = 
  ci.df %>%
  group_by(speaker, Stem, Suffix_vowel, Stem_vowel, Region) %>%
  summarise(s1mean = mean(s1), 
            s3mean = mean(s3), 
            F1mean = mean(F1n), 
            F2mean = mean(F2n)) %>%
  ungroup()

word.df %<>% rename(Suffix_vowel2 = Suffix_vowel)

# add a unique identifier
word.df = data.frame(word.df, indx = rep(1:nrow(word.df)))

We can now create a new dataframe from “ci.df” with the columns “speaker”, “Stem”, “Suffix_vowel”, “Stem_vowel”, “s1”, “s3”, “F1n”, “F2n”. We also add the column “bundle” as an identifier to keep track of the unique segments.

orig.df =
  ci.df %>%
  dplyr::select(speaker, Stem, Suffix_vowel, 
         Stem_vowel, Region, 
         s1, s3, F1n, F2n, ci, bundle)

We then join the two dataframes. If e.g. the lexical stem ‘bon’ occurs before suffixed -i, -a, -e, then each observation of ‘bon’ will be repeated 3 times in the new “join.df”: once in the context of aggregated s1 and s3 in ‘boni’, once in the context of aggregated s1 and s3 ‘bone’, once in the context of aggregated s1 and s3 ‘bona’. For this reason, “join.df” has many more observations than the original “ci.df”.

“Suffix_vowel” and “Suffix_vowel2” are \(j\) and \(k\) respectively in equation (7) in the paper. “s1”, “s3”, “s1mean”, “s3mean” are \(x\), \(y\), \(\bar{x}\), \(\bar{y}\) in equation (7). “F1n”, “F2n”, “F1mean”, “F2mean” are also \(x\), \(y\), \(\bar{x}\), \(\bar{y}\) in equation (7).

join.df = left_join(orig.df, word.df, by=c("speaker", "Stem", "Stem_vowel","Region"))

We now calculate Euclidean distances in the stem (“edist”) and Euclidean distances in the suffix formants (“fdist”). These distances are \(d\)s.w.j.k in equation (7) in the paper.

# create function:
euc = 
  function(a, b) {
    sqrt(sum((a - b)^2))
  }

join.df %<>%
  rowwise() %>%
  mutate(edist = euc(c(s1, s3), c(s1mean, s3mean))) %>%
  mutate(fdist = euc(c(F1n, F2n), c(F1mean, F2mean))) %>%
  ungroup()

We only want to retain those distances when suffix vowels are different, i.e. exclude e.g. distance calculations of e.g. ‘boni’ to the mean of ‘boni’.

other.df = join.df %>% filter(Suffix_vowel != Suffix_vowel2)

The violin plots of log. edist. below show that there is progressively more information in the stem from MM to the West to the East (Fig. 14 in the paper).

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols=c("brown1", "chartreuse3", "deepskyblue3")
legend_title <- "Region"

a=other.df %>%
  ggplot +
  aes(y = log(edist), x = Region, fill=Region) +
  #aes(y = edist, x = Suffix_vowel, fill=Suffix_vowel) +
  geom_violin() +
  ylim(-4.5, 2)+
  facet_grid(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  ylab(expression(d["stem"])) +
  xlab("") +
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme_light()+
  theme(text = element_text(size = 16), legend.position = "top",
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))


b=other.df %>%
  ggplot +
  aes(y = log(fdist), x = Region, fill=Region) +
  #aes(y = fdist, x = Suffix_vowel, fill=Suffix_vowel) +
  geom_violin() +
  ylim(-4, 2)+
  facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  ylab(expression(d["suffix"])) +
  xlab("Region") +
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme_light()+
  theme(text = element_text(size = 16), legend.position = "none", 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))

grid.arrange(a, b, nrow=2)

We now need to reduce the number of levels in the Suffix_vowel-Suffix_vowel2 combinations. We do this by treating a distance a to b and b to a as the same.

h = with(other.df, paste0(as.character(Suffix_vowel), as.character(Suffix_vowel2)))

Thus for example, “ea” includes distances of ‘bone’ tokens to aggregated ‘bona’, as well as distances of ‘bona’ tokens to aggregated ‘bone’

h[h=="ae"] = "ea"
h[h=="ai"] = "ia"
h[h=="ei"] = "ie"

# the above reduces everything to 3 levels
table(h)
## h
##   ea   ia   ie 
##  788 1311 1165
other.df = data.frame(other.df, H = factor(h))

# convert fdist and edist to logs to make it easier to read

other.df$edist = log(other.df$edist)
other.df$fdist = log(other.df$fdist)

Statistics

LMER model for stem-/e/:

cor.lmer.e = lmer(edist ~ fdist * Region * H 
                  + (fdist|Stem) + (fdist|speaker), 
                  data = other.df %>% filter(Stem_vowel == "e") )
# F-statistics

anova(cor.lmer.e)
## Type III Analysis of Variance Table with Satterthwaite's method
##                 Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## fdist           0.0973  0.0973     1   24.43  0.1379  0.713568    
## Region         24.5677 12.2839     2   28.26 17.4146 1.179e-05 ***
## H               9.6190  4.8095     2   16.67  6.8183  0.006856 ** 
## fdist:Region    1.4645  0.7322     2   28.97  1.0381  0.366932    
## fdist:H         1.7764  0.8882     2   22.95  1.2592  0.302774    
## Region:H       26.1995  6.5499     4 1412.56  9.2856 2.088e-07 ***
## fdist:Region:H 11.6100  2.9025     4  800.62  4.1148  0.002627 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/e/:

emtrends(cor.lmer.e, pairwise ~ Region|H,  var = 'fdist')
## $emtrends
## H = ea:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM         -0.0737 0.1369 167.2  -0.3440   0.1966
##  West       -0.1860 0.1411 144.7  -0.4649   0.0929
##  East        0.1373 0.0956  36.1  -0.0566   0.3311
## 
## H = ia:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.0913 0.1266  85.1  -0.1603   0.3430
##  West       -0.1230 0.1595 133.9  -0.4385   0.1925
##  East       -0.1989 0.1089  48.6  -0.4178   0.0200
## 
## H = ie:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.1839 0.0903  40.0   0.0014   0.3665
##  West        0.1143 0.1050  57.9  -0.0960   0.3245
##  East       -0.1112 0.0821  37.1  -0.2775   0.0551
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## H = ea:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.1123 0.189 256.1   0.593  0.8239
##  MM - East    -0.2109 0.157 177.7  -1.340  0.3750
##  West - East  -0.3232 0.157 171.1  -2.057  0.1021
## 
## H = ia:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.2143 0.188 172.2   1.138  0.4920
##  MM - East     0.2903 0.157 163.3   1.852  0.1561
##  West - East   0.0760 0.183 175.2   0.416  0.9092
## 
## H = ie:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.0697 0.126  60.5   0.552  0.8460
##  MM - East     0.2951 0.108  50.7   2.725  0.0235
##  West - East   0.2254 0.123  67.7   1.835  0.1659
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

The only significant slope is H = ie for MM, showing a positive trend. This suggests that, for MM, the bigger the separation between suffix /i, e/, the bigger the difference between stem-e in these two contexts. Also, the only significant contrast is between MM and the East, also for H = /i,e/: df = 50.7, t-ratio = 2.7, p = 0.02.

LMER model for stem-/o/:

cor.lmer.o = lmer(edist ~ fdist * Region * H 
                  + (fdist|Stem) + (fdist|speaker), 
                  data = other.df %>% filter(Stem_vowel == "o") )

#F-statistics

anova(cor.lmer.o)
## Type III Analysis of Variance Table with Satterthwaite's method
##                 Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## fdist           2.1042  2.1042     1   31.82  3.5747   0.06780 .  
## Region          8.0332  4.0166     2   35.74  6.8234   0.00309 ** 
## H              16.7843  8.3921     2  252.89 14.2567 1.359e-06 ***
## fdist:Region    0.8464  0.4232     2   29.10  0.7189   0.49574    
## fdist:H         0.2427  0.1214     2   30.28  0.2062   0.81484    
## Region:H       10.5774  2.6443     4 1688.70  4.4923   0.00130 ** 
## fdist:Region:H  4.7315  1.1829     4 1368.88  2.0095   0.09084 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/o/:

emtrends(cor.lmer.o, pairwise ~ Region|H,  var = 'fdist')
## $emtrends
## H = ea:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.2303 0.1697 195.1  -0.1044    0.565
##  West       -0.1191 0.2111 140.1  -0.5364    0.298
##  East        0.2794 0.1601  75.7  -0.0394    0.598
## 
## H = ia:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.1727 0.1060  62.7  -0.0391    0.385
##  West        0.1571 0.1373  67.2  -0.1170    0.431
##  East       -0.0486 0.0908  42.8  -0.2317    0.135
## 
## H = ie:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.2411 0.1148  61.0   0.0115    0.471
##  West        0.1259 0.1492  74.8  -0.1713    0.423
##  East        0.0968 0.1140  53.6  -0.1318    0.325
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## H = ea:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.3494 0.252 301.1   1.387  0.3492
##  MM - East    -0.0491 0.204 294.0  -0.241  0.9686
##  West - East  -0.3985 0.232 280.3  -1.716  0.2010
## 
## H = ia:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.0156 0.158  57.1   0.099  0.9946
##  MM - East     0.2213 0.119  46.2   1.853  0.1639
##  West - East   0.2056 0.148  50.8   1.393  0.3521
## 
## H = ie:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.1151 0.167  67.6   0.691  0.7697
##  MM - East     0.1442 0.135  67.3   1.072  0.5349
##  West - East   0.0291 0.162  72.0   0.180  0.9823
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Here the results are very similar as for stem-/e/ (H = ie shows for MM a positive trend), while there are in this case no significant contrasts.

The following plot confirms the above graphically, i.e. for MM, the bigger the separation between suffix-/i/ and suffix-/e/,the bigger the difference between stem-/e/ (left) and between stem-/o/ (right) in these two contexts.

other.df %>% 
  filter(Region=="MM" & H == "ie") %>%
  ggplot() +
  aes(x = fdist, y = edist) + 
  geom_point(size=2) +
  geom_smooth(method='lm', formula= y~x)+
  facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + 
  ylab(expression(d["stem"])) +
  xlab(expression(d["suffix"])) +
  theme_light()+
  theme(axis.text = element_text(size=16), axis.title.x = element_text(size=18),
        axis.title.y = element_text(size=18), text = element_text(size=24),  legend.title=element_blank(),
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))

APPENDIX E: Absence of relationship between stem vowel duration and suffix vowel duration

The plots below compare vowel duration in stem and suffix vowels between regions and separately by suffix vowel type. If suffix vowel loss were compensated by stem vowel lengthening, then the Eastern region with its high degree of reduction in suffix vowel quality and duration should have greater stem vowel duration than regions like MM: but as the plot below shows, this is evidently not the case (Fig. 19).

# we isolate the tokens for which the suffix vowel was phonetically realised.
realised=met.df %>% filter(Suffix == "Realised")

cols=c("brown1", "chartreuse3", "deepskyblue3")

a=ggplot(realised %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + 
  aes(y = StemDuration, x = Region, fill=Region) +
  geom_violin() + 
  theme_light()+ 
  ylim(0, 400)+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(values = cols) +
  facet_grid(~Suffix_vowel) +
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
       legend.position = "top",
        strip.text.x = element_text(color = "black", size=12),
        axis.title.y = element_text(size=14), text = element_text(size=16)) +
  ylab("Stem vowel duration (ms)")  + xlab ("") 

b=ggplot(realised %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +  
  aes(y = SuffDuration, x = Region, fill=Region) +
  geom_violin() + 
  theme_light()+ 
  ylim(0, 400)+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(values = cols) +
  facet_grid(~Suffix_vowel) +
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
        strip.text.x = element_text(color = "black", size=12),
         legend.position = "none",
        axis.title.y = element_text(size=14), text = element_text(size=16)) +
  ylab("Suffix vowel duration (ms)")  + xlab ("Region") 

grid.arrange(a, b, nrow=2)

APPENDIX F: Comparison between high and metaphonically raised vowels in the East

For this appendix, we used Lobanov-normalised formant values (taken at the vowels’ temporal midpoint) of the East. The dataframe we are using is “D_MZhigh” (MZ=“Mittelzone”, i.e. the East).

To create the plots in Appendix F (Fig. 20), we separate /e/~/i/ from /o/~/u/ stem vowels into two distinct groups.

eMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/i/", "Raised /e/", "Non-raised /e/"))
eMZ$whichvowel<-"/e/~/i/"

oMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/u/", "Raised /o/", "Non-raised /o/"))
oMZ$whichvowel<-"/o/~/u/"

In these plots, we do not include three-syllable words, since the syllable separating stem and suffix vowel might cause raising even if the suffix is a mid or low vowel.

a2 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", 
                                         "donne", "pecore", "pettini", "preti", "topi"))) +
  aes(y = F1n, x= Stem_vowel) + 
  geom_violin() +
  #facet_grid(~whichvowel)+
  theme_light()+ 
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
        axis.title.y = element_text(size=14), text = element_text(size=12)) +
  xlab("") +
  ylab("Normalised F1")

a1 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi")))+
  aes(y = F2n, x= Stem_vowel) + 
  geom_violin() +
  facet_grid(~whichvowel)+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  ylab("Normalised F2") +
  xlab("")+
  theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), 
        strip.text.x = element_text(color = "black", size=12),
        axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none")

b2 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi"))) +
  aes(y = F1n, x= Stem_vowel) + 
  geom_violin() +
  #facet_grid(~whichvowel)+
  theme_light()+ 
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
        axis.title.y = element_text(size=14), text = element_text(size=12)) +
  xlab("") +
  ylab("")

b1 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi")))+
  aes(y = F2n, x= Stem_vowel) + 
  geom_violin() +
  facet_grid(~whichvowel)+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  ylab("") +
  xlab("")+
  theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), 
        strip.text.x = element_text(color = "black", size=12),
        axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none")

one=grid.arrange(a1, a2, nrow=2)
two=grid.arrange(b1, b2, nrow=2)
#remove the objects that you do not need anymore

rm(a1,a2,b1,b2)

Below the violin plos showing metaphonic and non-metaphonic mid vowels compared to lexical high vowels in the East (Fig. 19). Lobanov-normalised higher/lower F1 values correspond to increasing vowel lowering/raising, while normalised higher/lower F2 values indicate increasing vowel fronting/retraction. These plots show that raised (metaphonic) /e, o/ has formant positions similar or even more extreme (i.e. indicating an even more peripheral vowel) than those in lexical /i, u/.

grid.arrange(one, two, nrow=1)

4. Extra analyses (Revisions)

4.1. Are there more (or less) deleted suffix vowels after specific consonants?

The plot below shows the proportion of final vowel deletion according to the type of preceding consonant, separately by region and stem vowel. Some consonants were grouped for convenience into categories: “rN” = /r/ + nasal, either /n/ or /m/; “nC” = nasal + stop, “ll” = geminate lateral; “CC” = geminate stop (/pp, kk, tt/ etc); “C” = singleton stop like /p, t, k/; “Affr.” = affricates /ddʒ, tts, tʃ/.

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols = c("black", "lightblue")
ggplot(met.df) + aes(fill = Suffix, x = Consonant) +
  geom_bar(position="fill") + 
  facet_grid(Region ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

The following plots group instead specific consonant classes.

The bar charts below show the proportion of final vowel deletion according to the sonority type of preceding consonant clusters, separately by region. Also affricates were considered here as clusters. These plots show that, in our data, vowel deletion after clusters with a falling sonority is slightly greater.

ggplot(met.df %>% filter(ClusterSonority %in% c("rising","falling"))) + aes(fill = Suffix, x = ClusterSonority) +
  geom_bar(position="fill") + 
  facet_grid(~Region) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

The figure below shows slightly more vowel deletion after geminate stops (/pp, tt, kk/) than after singleton ones (/p, t, k, d/).

ggplot(met.df %>% filter(Stops %in% c("singleton","geminate"))) + aes(fill = Suffix, x = Stops) +
  geom_bar(position="fill") + 
  facet_grid(~Region) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

4.2. Analysis of differences between etymologically Latin long (Proto-Romance mid-high) and Latin short (Proto-Romance mid-low) vowels

We add in information about whether the stem derives historically from a long or short vowel:

stems.long.e = c("mes", "femmin", "mel",  "stell")    
stems.long.o = c("nipot",   "sol", "spos", "soritS")
stems.long = c(stems.long.e, stems.long.o)

met.df = met.df %>%
  mutate(Length =
           case_when(Stem %in%  stems.long ~ "L",
                     TRUE ~ "S"))

s1

Stem-/e/

Two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. There is a not-quite significant difference between the models: \(\chi^2\)[3] = 6.5088, p = 0.08931. The not quite significant difference happens because there is a Region * Length interaction.

# Model with Length
e.s1.lmer1 = met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
e.s1.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# compare: Length has a marginal but non-significant effect
anova(e.s1.lmer1, e.s1.lmer2)
## Data: .
## Models:
## e.s1.lmer2: s1 ~ Suffix_vowel * Region + (Region | Stem) + (Suffix_vowel | speaker)
## e.s1.lmer1: s1 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (Suffix_vowel | speaker)
##            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## e.s1.lmer2   29 3623.6 3795.2 -1782.8   3565.6                       
## e.s1.lmer1   32 3623.0 3812.5 -1779.5   3559.0 6.5088  3    0.08931 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(e.s1.lmer1)
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        50.962 16.9872     3 87.182 89.2785 < 2.2e-16 ***
## Region               1.485  0.7425     2 43.636  3.9024   0.02759 *  
## Length               0.019  0.0189     1 25.923  0.0994   0.75511    
## Suffix_vowel:Region 25.776  4.2960     6 78.155 22.5783 2.934e-15 ***
## Region:Length        1.330  0.6648     2 26.776  3.4937   0.04486 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(e.s1.lmer2)
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        51.010 17.0033     3 87.663 89.4134 < 2.2e-16 ***
## Region               3.692  1.8462     2 49.726  9.7086 0.0002757 ***
## Suffix_vowel:Region 27.486  4.5810     6 77.864 24.0898 6.375e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The post-hoc tests for Region * Length show that MM has higher s1 than the West and higher s1 than the East on the short, but not the long vowels.

emmeans(e.s1.lmer1, pairwise ~ Length | Region)$contrasts
## Region = MM:
##  contrast estimate    SE   df t.ratio p.value
##  L - S      -0.140 0.199 26.8  -0.706  0.4864
## 
## Region = West:
##  contrast estimate    SE   df t.ratio p.value
##  L - S      -0.167 0.201 28.9  -0.833  0.4116
## 
## Region = East:
##  contrast estimate    SE   df t.ratio p.value
##  L - S       0.516 0.354 29.0   1.458  0.1555
## 
## Results are averaged over the levels of: Suffix_vowel 
## Degrees-of-freedom method: kenward-roger
emmeans(e.s1.lmer1, pairwise ~ Region | Length)$contrasts
## Length = L:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West      0.277 0.1491 47.9   1.859  0.1619
##  MM - East     -0.186 0.2463 31.3  -0.754  0.7337
##  West - East   -0.463 0.2896 32.6  -1.598  0.2610
## 
## Length = S:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West      0.250 0.0851 46.7   2.939  0.0139
##  MM - East      0.471 0.1040 45.7   4.525  0.0001
##  West - East    0.221 0.1206 42.5   1.829  0.1726
## 
## Results are averaged over the levels of: Suffix_vowel 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Stem-/o/

Also for stem-/o/, two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The difference between the models is again non-significant.

# Model with length
o.s1.lmer1 = met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +  Length * Region +
         (1|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
o.s1.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +
         (1|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare: Length has no effect
anova(o.s1.lmer1, o.s1.lmer2)
## Data: .
## Models:
## o.s1.lmer2: s1 ~ Suffix_vowel * Region + (1 | Stem) + (Suffix_vowel | speaker)
## o.s1.lmer1: s1 ~ Suffix_vowel * Region + Length * Region + (1 | Stem) + (Suffix_vowel | speaker)
##            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## o.s1.lmer2   24 3683.0 3823.9 -1817.5   3635.0                     
## o.s1.lmer1   27 3686.8 3845.3 -1816.4   3632.8 2.2652  3     0.5192
anova(o.s1.lmer1)
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        27.2114  9.0705     3   57.75 42.4804 1.253e-14 ***
## Region               2.9238  1.4619     2   42.67  6.8466  0.002637 ** 
## Length               0.3778  0.3778     1   26.27  1.7692  0.194918    
## Suffix_vowel:Region 10.2158  1.7026     6   48.16  7.9741 5.372e-06 ***
## Region:Length        0.0908  0.0454     2 2509.40  0.2125  0.808560    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(o.s1.lmer2)
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        27.362  9.1205     3 57.289 42.7442 1.235e-14 ***
## Region               3.048  1.5240     2 32.656  7.1423  0.002673 ** 
## Suffix_vowel:Region 10.360  1.7266     6 44.311  8.0918 6.286e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

s3

Stem-/e/

Two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The results do not how any significant difference (\(\chi^2\)[3] = 1.5351, p = 0.6742).

# Model with Length
e.s3.lmer1 = met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
e.s3.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare: 
anova(e.s3.lmer1, e.s3.lmer2)
## Data: .
## Models:
## e.s3.lmer2: s3 ~ Suffix_vowel * Region + (Region | Stem) + (Suffix_vowel | speaker)
## e.s3.lmer1: s3 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (Suffix_vowel | speaker)
##            npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## e.s3.lmer2   29 -2031.9 -1860.2 1045.0  -2089.9                     
## e.s3.lmer1   32 -2027.5 -1838.0 1045.7  -2091.4 1.5351  3     0.6742
anova(e.s3.lmer1)
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq  Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.49679 0.165598     3 66.242  6.9580 0.0003854 ***
## Region              0.11402 0.057012     2 53.447  2.3955 0.1008467    
## Length              0.00522 0.005219     1 27.626  0.2193 0.6432602    
## Suffix_vowel:Region 0.92288 0.153813     6 60.217  6.4628 2.676e-05 ***
## Region:Length       0.01772 0.008861     2 26.126  0.3723 0.6927437    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(e.s3.lmer2)
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq  Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.49021 0.163405     3 65.861  6.8654 0.0004289 ***
## Region              0.14693 0.073466     2 48.910  3.0867 0.0546474 .  
## Suffix_vowel:Region 0.92972 0.154954     6 59.504  6.5104  2.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The factor ‘Length’ has no effect.

Stem-/o/

A similar comparison was made for /o/ stems.

# Model with Length
o.s3.lmer1 = met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (1|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
o.s3.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (1|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare:
anova(o.s3.lmer1, o.s3.lmer2)
## Data: .
## Models:
## o.s3.lmer2: s3 ~ Suffix_vowel * Region + (Region | Stem) + (1 | speaker)
## o.s3.lmer1: s3 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (1 | speaker)
##            npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## o.s3.lmer2   20 -2652.3 -2534.9 1346.2  -2692.3                     
## o.s3.lmer1   23 -2649.3 -2514.3 1347.7  -2695.3 2.9965  3     0.3922
anova(o.s3.lmer1)
## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.03647 0.34549     3 2512.76 18.0373 1.401e-11 ***
## Region              0.04517 0.02259     2   42.02  1.1792    0.3175    
## Length              0.00003 0.00003     1   25.95  0.0018    0.9667    
## Suffix_vowel:Region 2.01447 0.33574     6  598.03 17.5286 < 2.2e-16 ***
## Region:Length       0.05684 0.02842     2   24.66  1.4836    0.2464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(o.s3.lmer2)
## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.0308 0.34360     3 2504.73 17.9279 1.642e-11 ***
## Region              0.1937 0.09685     2   42.72  5.0533   0.01072 *  
## Suffix_vowel:Region 2.0332 0.33886     6  465.58 17.6807 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The factor ‘Length’ has once again no effect.