1. Preliminaries
2. Acoustic analysis of stem vowels
- 2.1. Formant trajectory shapes (plots)
  - Principal components for stem-/e/:
  - Principal components for stem-/o/:
- 2.2. Regional variation
3. Analysis of suffix erosion
APPENDIX E: Comparison between high and metaphonically raised vowels in the East
4. Extra analyses (Revisions)

1. Preliminaries

library(data.table)
library(tidyverse)
library(magrittr)
library(lme4)
library(lmerTest)
library(emmeans)
library(fda)
library(gridExtra)

load("data.RData")

The loaded .RData contains all data needed for the analyses and plots:

ls()

## [1] "ci.df"     "D.pcafd.e" "D.pcafd.o" "D_MZhigh"  "e.df"      "met.df"   
## [7] "o.df"

‘met.df’ is the main dataframe including data for both stem vowels and both deleted and realised suffix vowels.

‘ci.df’ is the dataframe that includes only tokens with phonetically realised suffix vowels.

‘e.df’ and ‘o.df’ are dataframes including data for stem-/e/ and stem-/o/ respectively.

‘D_MZhigh’ is a dataframe of mid and high stem vowel tokens produced by the speakers from the East.

‘D.pcafd.e’ and ‘D.pcafd.o’ are functional data objects necessary for plotting Principal Components curves.

The FPCA-based analysis follows the procedure exemplified in the scripts by M. Gubian available at this GitHub repository.

2. Acoustic analysis of stem vowels

2.1. Formant trajectory shapes (plots)

The following code lines refer to Figs. 4 and 5 showing the first three Principal Components for stem-/e/ and stem-/o/ separately. Each panel isolates the effect of one PC, say PCk, by displaying several colour-coded curves, each one obtained by substituting a different value of the corresponding score s_k into equations (2a) and (2b) (see paper), setting all other scores to zero. The value s_k = 0 corresponds to the mean curve across the entire data for that vowel (thick black lines), and is therefore the same across panels of the same stem vowel and formant.

Principal components for stem-/e/:

tx <- seq(0, 1, length.out = 35) 

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25))

e.df%>% setDT()

scores.sd.e <- e.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.e$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.e[PC] * 
                  D.pcafd.e$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.e$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")

Principal components for stem-/o/:

curves <- CJ(time = tx, 
             PC = 1:3,
             Formant = 1:2,
             perturbation = seq(-1, 1, by=.25) 
)

o.df %>% setDT

scores.sd.o <- o.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric

curves %>%
  .[, value := (D.pcafd.o$meanfd$coefs[, 1, Formant] + 
                  perturbation * scores.sd.o[PC] * 
                  D.pcafd.o$harmonics$coefs[, PC, Formant]) %>% 
      fd(D.pcafd.o$meanfd$basis) %>% 
      eval.fd(tx, .), 
    by = .(PC, Formant, perturbation)]

curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top
PC_labeller <- as_labeller(function(x) paste0('PC', x))
Formant_labeller <- as_labeller(function(x) paste0('F', x))
ggplot(curves) +
  aes(x = time, y = value, group = perturbation, color = perturbation) +
  geom_line() +
  scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") +
  facet_grid(Formant ~ PC,
             scales = "free_y",
             labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) +
  labs(color = expression(s[k]/sigma[k])) +
  geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) +
  xlab("Normalised time") +
  ylab("Normalised frequency") +
  theme_light() +
  theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), 
        text = element_text(size = 16), legend.position = "bottom")

For both stem vowels, PC1 is associated with simultaneous variations in phonetic height and frontness/backness either between high front and low-mid front in the case of /e/, or between high back and low-mid back in the case of /o/. The phonetic interpretation of PC2, instead, might be related to a variation in lip rounding for /e/ and in phonetic backness for /o/. PC3 for both /e/ and /o/ encode variations between phonetically closing and opening diphthongs.

2.2. Regional variation

Below you can find the code used to generate the violin plots shown in Figs. 5, 6, 8, 9. These show the distribution of PC-score values separately by region and suffix vowel.

PC-score 1 (s₁), stem-/e/

Higher s₁ values are associated with increasing vowel lowering, while lower s₁ values correspond to increasing vowel raising.

cols.curves = c("red", "darkgrey", "orange","darkgreen")

ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s1, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[1]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 1 (s₁), stem-/o/:

ggplot(o.df %>% filter (s1 < 2.9) %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s1, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[1]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 3 (s₃), stem-/e/:

Higher s₃ values suggest greater opening diphthongisation.

ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s3, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[3]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

PC-score 3 (s₃), stem-/o/:

ggplot(o.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(Suffix_vowel, s3, fill = Suffix_vowel) +
  geom_violin() +
  facet_grid(. ~ Region) +
  ylab(expression(s[3]))+
  theme_light()+
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Suffix vowel")+
  scale_fill_manual(values = cols.curves, 
                    name = "Suffix vowel")+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")

Statistics

The following LMER models analyse s₁, s₂ (not analysed in the paper), and s₃ in stem-/e/ data.

The results show a significant influence on s₁ of the suffix vowel, of region, and a significant interaction between these factors. The results of the mixed model with s₃ as the dependent variable show a significant influence of the suffix, a not quite significant influence of region, and a significant interaction between these factors.

m.e <- list()
m.e[[1]] <- lmer(s1 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))

m.e[[2]] <- lmer(s2 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
m.e[[3]] <- lmer(s3 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
# F-statistics: 

anova(m.e[[1]])

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
## Suffix_vowel        71.085 23.6951     3 2493.58 122.1552 < 2.2e-16 ***
## Region               3.786  1.8928     2   49.07   9.7577 0.0002705 ***
## Suffix_vowel:Region 41.135  6.8559     6 1120.75  35.3441 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.e[[3]])

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.87698 0.29233     3 2632.7 11.9000 9.708e-08 ***
## Region              0.15161 0.07581     2   48.8  3.0859   0.05471 .  
## Suffix_vowel:Region 2.16552 0.36092     6 2068.1 14.6922 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/e/: s₁, differences between regions

The post-hoc tests show significant differences between all pairs of regions for suffix-/i/and for suffix-/u/, but no differences between the regions for suffix-/a/, and only one pairwise difference (MM vs West) for suffix-/e/.

m1.e <- m.e[[1]]

emmeans(m1.e, pairwise ~ Region | Suffix_vowel)$contrasts

## Suffix_vowel = e:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.2330 0.0958 78.1   2.432  0.0451
##  MM - East     0.1615 0.1155 58.5   1.398  0.3487
##  West - East  -0.0715 0.1314 58.6  -0.545  0.8497
## 
## Suffix_vowel = a:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.0835 0.1011 92.4   0.826  0.6878
##  MM - East    -0.1024 0.1190 65.2  -0.861  0.6669
##  West - East  -0.1859 0.1355 65.8  -1.372  0.3615
## 
## Suffix_vowel = u:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3388 0.0947 76.3   3.576  0.0017
##  MM - East     0.7975 0.1137 55.0   7.015  <.0001
##  West - East   0.4587 0.1303 56.5   3.519  0.0025
## 
## Suffix_vowel = i:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3379 0.0907 66.1   3.724  0.0012
##  MM - East     0.7545 0.1125 53.0   6.707  <.0001
##  West - East   0.4167 0.1277 52.4   3.263  0.0054
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests for stem-/e/: s₃, differences between regions

The post-hoc tests show a significant difference between the West and the other two regions for suffix-/i/ and for suffix-/u/. There were no significant differences between any of the regions for suffixes-/e, a/.

m3.e <- m.e[[3]]

emmeans(m3.e, pairwise ~ Region | Suffix_vowel)$contrasts

## Suffix_vowel = e:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.064126 0.0630 68.2  -1.018  0.5678
##  MM - East    0.030996 0.0467 49.0   0.664  0.7855
##  West - East  0.095122 0.0607 68.4   1.568  0.2664
## 
## Suffix_vowel = a:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West    0.023828 0.0642 73.3   0.371  0.9269
##  MM - East    0.024626 0.0477 53.2   0.516  0.8640
##  West - East  0.000799 0.0619 73.6   0.013  0.9999
## 
## Suffix_vowel = u:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.241524 0.0627 67.2  -3.850  0.0008
##  MM - East    0.051250 0.0461 46.7   1.111  0.5122
##  West - East  0.292774 0.0604 67.0   4.850  <.0001
## 
## Suffix_vowel = i:
##  contrast     estimate     SE   df t.ratio p.value
##  MM - West   -0.172545 0.0619 63.7  -2.788  0.0189
##  MM - East    0.010344 0.0458 45.5   0.226  0.9723
##  West - East  0.182889 0.0596 63.8   3.068  0.0087
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

The following LMER models analyse s₁, s₂ (not analysed in the paper), and s₃ in stem-/o/ data.

The results show for both s₁ and s₃ a significant influence of the suffix vowel, of region, and a significant interaction between these factors.

m.o <- list()
m.o[[1]] <- lmer(s1 ~  Suffix_vowel * Region +
                   (1 + Region|Stem) + (1 |speaker),
                 data = o.df, 
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))

m.o[[2]] <- lmer(s2 ~ Suffix_vowel * Region +
                   (1 + Region|Stem) + (1|speaker),
                 data = e.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
m.o[[3]] <- lmer(s3 ~  Suffix_vowel * Region +
                   (1 + Region|Stem) + (1 |speaker),
                 data = o.df,
                 control=lmerControl(check.conv.singular = .makeCC(action = "ignore", 
                                                                   tol = 1e-4)))
# F-statistics: 

anova(m.o[[1]])

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
## Suffix_vowel        168.874  56.291     3 1995.62 253.6305 < 2.2e-16 ***
## Region                2.819   1.410     2   44.22   6.3509  0.003762 ** 
## Suffix_vowel:Region  50.241   8.374     6  889.03  37.7285 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.o[[3]])

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.0308 0.34360     3 2504.73 17.9279 1.642e-11 ***
## Region              0.1937 0.09685     2   42.72  5.0533   0.01072 *  
## Suffix_vowel:Region 2.0332 0.33886     6  465.58 17.6807 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/o/: s₁, differences between regions

The results show a significant difference between MM and the East in the context of all four suffix vowels. There were significant differences between MM and the West in three suffix vowel contexts but not in /a/. There were differences between the West and the East in the context of suffix-/u/ and suffix-/a/ but not in the context of the other two suffix vowels.

m1.o <- m.o[[1]]

emmeans(m1.o, pairwise ~ Region | Suffix_vowel)$contrasts

## Suffix_vowel = e:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3531 0.1019 65.5   3.466  0.0027
##  MM - East     0.3053 0.1081 86.0   2.825  0.0160
##  West - East  -0.0478 0.1133 91.8  -0.422  0.9068
## 
## Suffix_vowel = a:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.0386 0.0944 53.6   0.409  0.9120
##  MM - East    -0.2436 0.1000 65.2  -2.436  0.0457
##  West - East  -0.2822 0.1038 71.4  -2.720  0.0220
## 
## Suffix_vowel = u:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.2574 0.0925 50.8   2.782  0.0203
##  MM - East     0.5651 0.0983 61.2   5.748  <.0001
##  West - East   0.3077 0.1018 67.1   3.023  0.0098
## 
## Suffix_vowel = i:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West     0.3552 0.0953 54.7   3.727  0.0013
##  MM - East     0.4742 0.1016 68.7   4.668  <.0001
##  West - East   0.1190 0.1049 72.9   1.134  0.4965
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests for stem-/o/: s₃, differences between regions

The results show significant differences between the West and the other two regions in the context of suffix-/i/ and suffix-/u/, but not for the other two suffix vowel contexts. There were no significant differences between MM and the West in any contexts.

m3.o = m.o[[3]]

emmeans(m3.o, pairwise ~ Region | Suffix_vowel)$contrasts

## Suffix_vowel = e:
##  contrast    estimate     SE    df t.ratio p.value
##  MM - West   -0.06560 0.0323 102.9  -2.034  0.1093
##  MM - East    0.04753 0.0266  89.5   1.790  0.1787
##  West - East  0.11313 0.0390  82.1   2.901  0.0131
## 
## Suffix_vowel = a:
##  contrast    estimate     SE    df t.ratio p.value
##  MM - West    0.03459 0.0295  79.0   1.173  0.4728
##  MM - East    0.00791 0.0240  66.1   0.329  0.9420
##  West - East -0.02667 0.0359  60.3  -0.743  0.7393
## 
## Suffix_vowel = u:
##  contrast    estimate     SE    df t.ratio p.value
##  MM - West   -0.16418 0.0289  74.3  -5.687  <.0001
##  MM - East   -0.01035 0.0235  61.3  -0.441  0.8986
##  West - East  0.15382 0.0353  56.7   4.357  0.0002
## 
## Suffix_vowel = i:
##  contrast    estimate     SE    df t.ratio p.value
##  MM - West   -0.10665 0.0299  82.4  -3.569  0.0017
##  MM - East    0.04783 0.0245  69.7   1.949  0.1327
##  West - East  0.15448 0.0363  63.3   4.253  0.0002
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Reconstructed formants from emmeans (Figs. 7 and 10)

The results presented above show that suffix vowels influenced the phonetic height of stem vowels. This effect is most clearly seen in the reconstructed formant plots for F1 in MM and the East in Figs. 7 and 10, in which the stem vowels – especially for the East – had a progressively higher F1 in the context of suffix vowels.

Stem-/e/

Preparatory data for /e/-reconstruction:

emm.e <- lapply(m.e, function(m_) {
  emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>%
    as.data.table
})

METSuffVowel <- emm.e[[1]][, .(Suffix_vowel)] %>% unique

# join it to all tables in emm.e
lapply(emm.e, function(emm_) {emm_[METSuffVowel,
                                   on = "Suffix_vowel",
                                   Suffix_vowel := i.Suffix_vowel]})

## [[1]]
##     Suffix_vowel Region      emmean         SE       df    lower.CL    upper.CL
##  1:            e     MM  0.27107290 0.08777733 59.23619  0.09544552  0.44670028
##  2:            a     MM  0.39999056 0.09007236 65.21796  0.22011504  0.57986608
##  3:            u     MM  0.19614030 0.08725370 57.81207  0.02147096  0.37080964
##  4:            i     MM  0.12837604 0.08609052 55.07180 -0.04414816  0.30090024
##  5:            e   West  0.03803987 0.09483197 73.49284 -0.15093862  0.22701836
##  6:            a   West  0.31650274 0.09846572 83.64613  0.12068075  0.51232472
##  7:            u   West -0.14265457 0.09445784 73.09185 -0.33090480  0.04559566
##  8:            i   West -0.20949639 0.09122045 64.49793 -0.39170308 -0.02728969
##  9:            e   East  0.10957609 0.12322571 36.59540 -0.14019614  0.35934831
## 10:            a   East  0.50242133 0.12483968 38.50205  0.24980478  0.75503788
## 11:            u   East -0.60134577 0.12246148 35.67487 -0.84978778 -0.35290377
## 12:            i   East -0.62617307 0.12212993 35.30549 -0.87403333 -0.37831280
## 
## [[2]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM -0.007564794 0.06461860 62.28374 -0.136723827
##  2:            a     MM -0.005318088 0.06588441 67.03811 -0.136822551
##  3:            u     MM -0.079284466 0.06434214 61.20168 -0.207935946
##  4:            i     MM  0.008804732 0.06369072 58.92767 -0.118643370
##  5:            e   West  0.065658119 0.05760299 79.20391 -0.048993175
##  6:            a   West  0.096894564 0.05997443 90.77125 -0.022241324
##  7:            u   West -0.041372364 0.05730247 78.53079 -0.155440666
##  8:            i   West  0.124161930 0.05520261 68.74408  0.014028414
##  9:            e   East -0.039033083 0.05121099 68.01462 -0.141222626
## 10:            a   East  0.107544881 0.05258402 74.74011  0.002786178
## 11:            u   East -0.130047894 0.05046928 64.26871 -0.230863755
## 12:            i   East -0.049532444 0.05019333 63.01593 -0.149835319
##        upper.CL
##  1:  0.12159424
##  2:  0.12618638
##  3:  0.04936701
##  4:  0.13625283
##  5:  0.18030941
##  6:  0.21603045
##  7:  0.07269594
##  8:  0.23429545
##  9:  0.06315646
## 10:  0.21230358
## 11: -0.02923203
## 12:  0.05077043
## 
## [[3]]
##     Suffix_vowel Region       emmean         SE       df    lower.CL
##  1:            e     MM -0.012748327 0.03888923 61.24702 -0.09050581
##  2:            a     MM -0.008875929 0.03955190 65.28234 -0.08786004
##  3:            u     MM -0.027134288 0.03873883 60.29298 -0.10461575
##  4:            i     MM -0.007483216 0.03840370 58.38951 -0.08434567
##  5:            e   West  0.051377269 0.05434522 61.95838 -0.05725874
##  6:            a   West -0.032703569 0.05525872 66.02222 -0.14303046
##  7:            u   West  0.214389721 0.05424038 61.48142  0.10594654
##  8:            i   West  0.165061880 0.05348292 58.20957  0.05801239
##  9:            e   East -0.043744542 0.03981215 63.23419 -0.12329701
## 10:            a   East -0.033502211 0.04041619 66.91832 -0.11417507
## 11:            u   East -0.078383925 0.03951051 61.32248 -0.15738166
## 12:            i   East -0.017827177 0.03938848 60.62178 -0.09659925
##         upper.CL
##  1: 0.0650091573
##  2: 0.0701081856
##  3: 0.0503471725
##  4: 0.0693792326
##  5: 0.1600132776
##  6: 0.0776233255
##  7: 0.3228328996
##  8: 0.2721113668
##  9: 0.0358079286
## 10: 0.0471706503
## 11: 0.0006138114
## 12: 0.0609449015

tx.e <- seq(0, 1, length.out = 35)
curves.e <- CJ(time = tx.e,
               Formant = 1:2,
               Vowel_ =  c("a", "e", "i", "u"),
               Region_ = c("MM", "East", "West")
)
curves.e[, value := (D.pcafd.e$meanfd$coefs[,1,Formant] +
                       sapply(c(1, 3), function(PC) {
                         (emm.e[[PC]] %>%
                            .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>%
                            as.numeric) *
                           D.pcafd.e$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>%
           fd(., D.pcafd.e$meanfd$basis) %>%
           eval.fd(tx.e, .),
         by = .(Formant, Vowel_, Region_)]

curves.e[, Region_ := factor (Region_, levels = c("MM", "West", "East"))]
# change 1, 2 to F1, F2
curves.e[, Formant := paste0("F", Formant)]
# order formants F2, F1, so that F2 is on top in panels (optional)
curves.e[, Formant := factor(Formant, levels = c("F2", "F1"))]
# order vowels e i a u to make use of Paired palette,
# i.e. darker = metaphony, base color = backness (optional)
curves.e[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))]

Plot of reconstructed stem-/e/ formants:

cols.curves = c("red", "darkgrey", "orange", "darkgreen")

ggplot(curves.e) +
  aes(x = time,  col = Vowel_, group = Vowel_) +
  geom_line(aes(y = value), size=1.3) +
  facet_grid(Formant ~ Region_) +
  scale_colour_manual(values = cols.curves) +
  theme_light()+
  theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
        axis.title.y = element_text(size=14), text = element_text(size=16),  
        legend.title=element_blank(), legend.position = "top") +
  xlab("Normalised time") +
  ylab("Normalised frequency")

Stem-/e/

Preparatory data for /o/-reconstruction:

emm.o <- lapply(m.o, function(m_) {
  emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>%
    as.data.table
})

METSuffVowel <- emm.o[[1]][, .(Suffix_vowel)] %>% unique

# join it to all tables in emm.e
lapply(emm.o, function(emm_) {emm_[METSuffVowel,
                                   on = "Suffix_vowel",
                                   Suffix_vowel := i.Suffix_vowel]})

## [[1]]
##     Suffix_vowel Region      emmean         SE       df    lower.CL    upper.CL
##  1:            e     MM  0.26234992 0.08053700 80.88364  0.10210306  0.42259678
##  2:            a     MM  0.40039930 0.07660715 68.10165  0.24753624  0.55326236
##  3:            u     MM -0.01732291 0.07546055 64.67353 -0.16804247  0.13339664
##  4:            i     MM  0.10319980 0.07718774 70.34470 -0.05073304  0.25713264
##  5:            e   West -0.09070773 0.08945623 95.28735 -0.26829390  0.08687844
##  6:            a   West  0.36177541 0.08284071 78.68539  0.19687485  0.52667597
##  7:            u   West -0.27474320 0.08129016 74.76994 -0.43668967 -0.11279673
##  8:            i   West -0.25202974 0.08333669 79.07637 -0.41790479 -0.08615470
##  9:            e   East -0.04292811 0.09860538 59.85718 -0.24017791  0.15432169
## 10:            a   East  0.64400541 0.09351622 48.68388  0.45604667  0.83196416
## 11:            u   East -0.58241246 0.09260990 46.76755 -0.76874399 -0.39608093
## 12:            i   East -0.37099514 0.09420928 49.95566 -0.56022421 -0.18176607
## 
## [[2]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM -0.007564794 0.06461860 62.28374 -0.136723827
##  2:            a     MM -0.005318088 0.06588441 67.03811 -0.136822551
##  3:            u     MM -0.079284466 0.06434214 61.20168 -0.207935946
##  4:            i     MM  0.008804732 0.06369072 58.92767 -0.118643370
##  5:            e   West  0.065658119 0.05760299 79.20391 -0.048993175
##  6:            a   West  0.096894564 0.05997443 90.77125 -0.022241324
##  7:            u   West -0.041372364 0.05730247 78.53079 -0.155440666
##  8:            i   West  0.124161930 0.05520261 68.74408  0.014028414
##  9:            e   East -0.039033083 0.05121099 68.01462 -0.141222626
## 10:            a   East  0.107544881 0.05258402 74.74011  0.002786178
## 11:            u   East -0.130047894 0.05046928 64.26871 -0.230863755
## 12:            i   East -0.049532444 0.05019333 63.01593 -0.149835319
##        upper.CL
##  1:  0.12159424
##  2:  0.12618638
##  3:  0.04936701
##  4:  0.13625283
##  5:  0.18030941
##  6:  0.21603045
##  7:  0.07269594
##  8:  0.23429545
##  9:  0.06315646
## 10:  0.21230358
## 11: -0.02923203
## 12:  0.05077043
## 
## [[3]]
##     Suffix_vowel Region       emmean         SE       df     lower.CL
##  1:            e     MM  0.004680716 0.02911823 51.95201 -0.053750571
##  2:            a     MM -0.014871485 0.02825312 46.60736 -0.071722066
##  3:            u     MM -0.026498067 0.02800930 45.18957 -0.082905164
##  4:            i     MM  0.001042657 0.02837531 47.41242 -0.056027995
##  5:            e   West  0.070279037 0.03437757 66.58618  0.001653274
##  6:            a   West -0.049457514 0.03225185 52.91911 -0.114148842
##  7:            u   West  0.137677963 0.03181965 50.41551  0.073779366
##  8:            i   West  0.107690021 0.03252480 54.55323  0.042496872
##  9:            e   East -0.042847944 0.03410681 42.20084 -0.111668565
## 10:            a   East -0.022785194 0.03291413 36.77640 -0.089489237
## 11:            u   East -0.016146517 0.03268851 35.79080 -0.082455344
## 12:            i   East -0.046788874 0.03304179 37.30959 -0.113719159
##       upper.CL
##  1: 0.06311200
##  2: 0.04197909
##  3: 0.02990903
##  4: 0.05811331
##  5: 0.13890480
##  6: 0.01523381
##  7: 0.20157656
##  8: 0.17288317
##  9: 0.02597268
## 10: 0.04391885
## 11: 0.05016231
## 12: 0.02014141

tx.o <- seq(0, 1, length.out = 35)
curves.o <- CJ(time = tx.o,
               Formant = 1:2,
               Vowel_ =  c("a", "e", "i", "u"),
               Region_ = c("MM", "East", "West")
)
curves.o[, value := (D.pcafd.o$meanfd$coefs[,1,Formant] +
                       sapply(c(1, 3), function(PC) {
                         (emm.o[[PC]] %>%
                            .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>%
                            as.numeric) *
                           D.pcafd.o$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>%
           fd(., D.pcafd.o$meanfd$basis) %>%
           eval.fd(tx.o, .),
         by = .(Formant, Vowel_, Region_)]

curves.o[, Region_ := factor (Region_, levels = c("MM", "West", "East"))]
# change 1, 2 to F1, F2
curves.o[, Formant := paste0("F", Formant)]
# order formants F2, F1, so that F2 is on top in panels (optional)
curves.o[, Formant := factor(Formant, levels = c("F2", "F1"))]
# order vowels e i a u to make use of Paired palette,
curves.o[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))]

Plot of reconstructed stem-/o/ formants:

cols.curves = c("red", "darkgrey", "orange", "darkgreen")

ggplot(curves.o) +
  aes(x = time,  col = Vowel_, group = Vowel_) +
  geom_line(aes(y = value), size=1.3) +
  facet_grid(Formant ~ Region_) +
  scale_colour_manual(values = cols.curves) +
  theme_light()+
  theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"),
        axis.title.y = element_text(size=14), text = element_text(size=16),  
        legend.title=element_blank(), legend.position = "top") +
  xlab("Normalised time") +
  ylab("Normalised frequency")

3. Analysis of suffix erosion

3.1. Suffix deletion

Plot of proportions of deleted vs realised suffix vowels, separately by region, stem vowel, and suffix vowel (Fig. 12):

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")
cols = c("black", "lightblue")
ggplot(met.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + 
  aes(fill = Suffix, x = Region) +
  geom_bar(position="fill") + 
  facet_grid(Suffix_vowel ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

This shows that the extent of suffix deletion was greater in the East, least for MM, and with the West between the two.

Statistics

This is the GLMER model testing for significance of differences observed in the plot above. An extra column, suffix_del, is defined carrying suffix deletion as a logical variable (two levels: True/False).

suffix.glmer = glmer(suffix_del ~ Region + Suffix_vowel + Stem_vowel + (1 | speaker) +  
                       (0 + Region | Stem) + (1|Stem) + Region:Suffix_vowel + 
                       Region:Stem_vowel, family = binomial, 
                     control=glmerControl(optimizer="bobyqa"),
                     data = met.df %>% mutate(suffix_del = is.na(ci)))

# F-statistics: 

joint_tests(suffix.glmer)

##  model term          df1 df2 F.ratio p.value
##  Region                2 Inf   6.497  0.0015
##  Suffix_vowel          3 Inf   4.238  0.0053
##  Stem_vowel            1 Inf   0.893  0.3448
##  Region:Suffix_vowel   6 Inf   1.982  0.0645
##  Region:Stem_vowel     2 Inf   5.668  0.0035

The GLMM analysis confirms that the degree of suffix deletion was significantly influenced by both region and suffix vowel.

Post-hoc tests: differences between suffixes

The only region showing differences between suffix vowels is the West, which shows a greater deletion for /i, u/ than for /a/ suffix vowels.

emmeans(suffix.glmer, pairwise ~ Suffix_vowel | Region)$contrasts

## Region = MM:
##  contrast estimate    SE  df z.ratio p.value
##  e - a      0.5711 0.455 Inf   1.256  0.5912
##  e - u     -0.2260 0.398 Inf  -0.568  0.9415
##  e - i     -0.3247 0.361 Inf  -0.900  0.8048
##  a - u     -0.7971 0.376 Inf  -2.117  0.1476
##  a - i     -0.8958 0.409 Inf  -2.191  0.1257
##  u - i     -0.0987 0.319 Inf  -0.309  0.9898
## 
## Region = West:
##  contrast estimate    SE  df z.ratio p.value
##  e - a      0.8693 0.460 Inf   1.891  0.2318
##  e - u     -0.2858 0.401 Inf  -0.712  0.8923
##  e - i     -0.3539 0.358 Inf  -0.988  0.7564
##  a - u     -1.1552 0.385 Inf  -2.998  0.0144
##  a - i     -1.2232 0.407 Inf  -3.003  0.0142
##  u - i     -0.0680 0.315 Inf  -0.216  0.9964
## 
## Region = East:
##  contrast estimate    SE  df z.ratio p.value
##  e - a     -0.0318 0.221 Inf  -0.144  0.9989
##  e - u      0.0608 0.222 Inf   0.274  0.9928
##  e - i     -0.0870 0.201 Inf  -0.433  0.9728
##  a - u      0.0926 0.182 Inf   0.508  0.9573
##  a - i     -0.0552 0.214 Inf  -0.258  0.9940
##  u - i     -0.1478 0.185 Inf  -0.798  0.8555
## 
## Results are averaged over the levels of: Stem_vowel 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

Post-hoc tests: differences between regions

These show that MM–East contrasts were significant for all stem-suffix vowel combinations. Conversely, contrasts between either MM and the West or the West and the East were only sporadically significant.

emmeans(suffix.glmer, pairwise ~ Region | Stem_vowel * Suffix_vowel)$contrasts

## Stem_vowel = e, Suffix_vowel = e:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.851423 0.802 Inf  -1.062  0.5376
##  MM - East   -2.429193 0.742 Inf  -3.274  0.0030
##  West - East -1.577770 0.753 Inf  -2.096  0.0907
## 
## Stem_vowel = o, Suffix_vowel = e:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.854860 0.787 Inf  -2.356  0.0485
##  MM - East   -2.202240 0.747 Inf  -2.947  0.0090
##  West - East -0.347380 0.743 Inf  -0.467  0.8866
## 
## Stem_vowel = e, Suffix_vowel = a:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.553199 0.856 Inf  -0.646  0.7946
##  MM - East   -3.032131 0.768 Inf  -3.949  0.0002
##  West - East -2.478932 0.784 Inf  -3.161  0.0045
## 
## Stem_vowel = o, Suffix_vowel = a:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.556636 0.819 Inf  -1.900  0.1386
##  MM - East   -2.805178 0.751 Inf  -3.736  0.0005
##  West - East -1.248541 0.752 Inf  -1.660  0.2206
## 
## Stem_vowel = e, Suffix_vowel = u:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.911242 0.778 Inf  -1.171  0.4708
##  MM - East   -2.142405 0.722 Inf  -2.968  0.0084
##  West - East -1.231162 0.736 Inf  -1.672  0.2161
## 
## Stem_vowel = o, Suffix_vowel = u:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.914679 0.747 Inf  -2.562  0.0281
##  MM - East   -1.915451 0.709 Inf  -2.700  0.0190
##  West - East -0.000772 0.710 Inf  -0.001  1.0000
## 
## Stem_vowel = e, Suffix_vowel = i:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -0.880592 0.762 Inf  -1.156  0.4795
##  MM - East   -2.191537 0.711 Inf  -3.081  0.0058
##  West - East -1.310944 0.725 Inf  -1.807  0.1672
## 
## Stem_vowel = o, Suffix_vowel = i:
##  contrast     estimate    SE  df z.ratio p.value
##  MM - West   -1.884029 0.744 Inf  -2.531  0.0306
##  MM - East   -1.964583 0.713 Inf  -2.756  0.0161
##  West - East -0.080554 0.712 Inf  -0.113  0.9930
## 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates

3.2. Suffix centralisation

Violin plots showing centralisation (c) index values (“ci” in the script) for suffix vowels, separately for the three regions, stem vowel, and suffix vowel type (Fig. 13). Higher values and values around 0 are indicative of greater centralisation.

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols=c("brown1", "chartreuse3", "deepskyblue3")
legend_title <- "Region"
ggplot(ci.df %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(y = ci, x = Region, fill=Region) + geom_violin(trim=F)+
  facet_grid(Stem_vowel ~ Suffix_vowel, 
             labeller=labeller(Stem_vowel=Stem_vowel.labs))+
  ylab("c")+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme(strip.text.x = element_text(color = "black"), 
        strip.text.y = element_text(color = "black"), text = element_text(size = 16), 
        legend.position = "top")+
  xlab("Region")

Statistics

The results of the mixed model showed significant influences on the centralisation index of the region and of the suffix vowel. There was also a significant interaction between these two fixed factors, and between region, stem vowel, and suffix vowel.

ci.lmer = lmer(ci ~ Region * Stem_vowel * Suffix_vowel + (1|Stem) + (1|speaker), data = ci.df)

# F-statistics:
anova(ci.lmer)

## Type III Analysis of Variance Table with Satterthwaite's method
##                                 Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)
## Region                         13.1195  6.5597     2   32.3 20.8807 1.519e-06
## Stem_vowel                      0.0141  0.0141     1   47.3  0.0450    0.8330
## Suffix_vowel                   26.4046  8.8015     3 3124.1 28.0167 < 2.2e-16
## Region:Stem_vowel               0.3193  0.1597     2 4618.2  0.5082    0.6016
## Region:Suffix_vowel            20.2697  3.3783     6 4590.5 10.7536 6.584e-12
## Stem_vowel:Suffix_vowel         1.9632  0.6544     3 3124.8  2.0831    0.1003
## Region:Stem_vowel:Suffix_vowel 10.2801  1.7134     6 4590.6  5.4539 1.240e-05
##                                   
## Region                         ***
## Stem_vowel                        
## Suffix_vowel                   ***
## Region:Stem_vowel                 
## Region:Suffix_vowel            ***
## Stem_vowel:Suffix_vowel           
## Region:Stem_vowel:Suffix_vowel ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests: differences between regions

The results show that there was significantly greater centralisation in the suffix vowel for the East than MM for all stem-suffix vowel combinations. There was also greater suffix vowel centralisation for the West than MM for /i, a/-suffixes and for /e/-suffixes preceded by /o/-stems. The extent of suffix vowel centralisation was also greater for the East than the West for /i, u, a/-suffixes.

emm_options(pbkrtest.limit = nrow(ci.df),lmerTest.limit = nrow(ci.df))

emmeans(ci.lmer, pairwise ~ Region | Suffix_vowel * Stem_vowel)$contrasts

## Suffix_vowel = e, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.309 0.161 43.2  -1.925  0.1438
##  MM - East     -0.536 0.146 40.4  -3.676  0.0020
##  West - East   -0.226 0.154 45.0  -1.471  0.3143
## 
## Suffix_vowel = a, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.587 0.164 47.2  -3.575  0.0023
##  MM - East     -1.104 0.149 44.4  -7.402  <.0001
##  West - East   -0.517 0.158 50.0  -3.273  0.0054
## 
## Suffix_vowel = u, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.275 0.161 43.5  -1.710  0.2130
##  MM - East     -1.029 0.144 39.0  -7.123  <.0001
##  West - East   -0.753 0.154 45.0  -4.896  <.0001
## 
## Suffix_vowel = i, Stem_vowel = e:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.518 0.158 40.5  -3.277  0.0060
##  MM - East     -0.935 0.143 37.9  -6.518  <.0001
##  West - East   -0.416 0.151 42.0  -2.753  0.0232
## 
## Suffix_vowel = e, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.440 0.167 50.9  -2.630  0.0298
##  MM - East     -0.819 0.150 45.7  -5.452  <.0001
##  West - East   -0.379 0.162 54.7  -2.345  0.0580
## 
## Suffix_vowel = a, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.506 0.160 42.7  -3.159  0.0080
##  MM - East     -0.982 0.145 39.3  -6.784  <.0001
##  West - East   -0.475 0.153 44.2  -3.103  0.0092
## 
## Suffix_vowel = u, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.208 0.160 42.5  -1.301  0.4027
##  MM - East     -0.725 0.144 38.0  -5.049  <.0001
##  West - East   -0.517 0.153 43.6  -3.385  0.0042
## 
## Suffix_vowel = i, Stem_vowel = o:
##  contrast    estimate    SE   df t.ratio p.value
##  MM - West     -0.455 0.163 45.6  -2.796  0.0202
##  MM - East     -0.923 0.146 40.4  -6.332  <.0001
##  West - East   -0.468 0.156 47.1  -3.004  0.0116
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Post-hoc tests: differences between between suffixes

Across the three regions, the suffix vowel /e/ was more centralised than /a, i, u/-suffixes.

emmeans(ci.lmer, pairwise ~ Suffix_vowel)$contrasts

##  contrast estimate     SE   df t.ratio p.value
##  e - a     0.21991 0.0315 3328   6.983  <.0001
##  e - u     0.22733 0.0325 2573   6.987  <.0001
##  e - i     0.24833 0.0294 3764   8.460  <.0001
##  a - u     0.00742 0.0289 3470   0.257  0.9940
##  a - i     0.02842 0.0314 2720   0.904  0.8027
##  u - i     0.02100 0.0264 4026   0.794  0.8572
## 
## Results are averaged over the levels of: Region, Stem_vowel 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 4 estimates

3.3. Correlation between suffix centralisation and metaphony within regions

This analysis needs a separate dataframe. The following code lines explain step-by-step how this was created.

First, we rename “u” in Suffix_vowel as “i” - this is because there are too few tokens for distances between stems with suffix-/e/ and suffix-/u/ to be calculated. “i” stands therefore for a high vowel (either “i” or “u”).

h = as.character(ci.df$Suffix_vowel)
h[h == "u"] = "i"
ci.df$Suffix_vowel = factor(h)

Then we calculate the mean s₁, mean s₃, mean F1n (“n” = normalised), and mean F2n for each speaker-Stem-Suffix vowel combination. This is \(\bar{x}\)_s.w.k and \(\bar{y}\)_s.w.k in (7) in the paper (p. 19)

word.df = 
  ci.df %>%
  group_by(speaker, Stem, Suffix_vowel, Stem_vowel, Region) %>%
  summarise(s1mean = mean(s1), 
            s3mean = mean(s3), 
            F1mean = mean(F1n), 
            F2mean = mean(F2n)) %>%
  ungroup()

word.df %<>% rename(Suffix_vowel2 = Suffix_vowel)

# add a unique identifier
word.df = data.frame(word.df, indx = rep(1:nrow(word.df)))

We can now create a new dataframe from “ci.df” with the columns “speaker”, “Stem”, “Suffix_vowel”, “Stem_vowel”, “s1”, “s3”, “F1n”, “F2n”. We also add the column “bundle” as an identifier to keep track of the unique segments.

orig.df =
  ci.df %>%
  dplyr::select(speaker, Stem, Suffix_vowel, 
         Stem_vowel, Region, 
         s1, s3, F1n, F2n, ci, bundle)

We then join the two dataframes. If e.g. the lexical stem ‘bon’ occurs before suffixed -i, -a, -e, then each observation of ‘bon’ will be repeated 3 times in the new “join.df”: once in the context of aggregated s₁ and s₃ in ‘boni’, once in the context of aggregated s₁ and s₃ ‘bone’, once in the context of aggregated s₁ and s₃ ‘bona’. For this reason, “join.df” has many more observations than the original “ci.df”.

“Suffix_vowel” and “Suffix_vowel2” are \(j\) and \(k\) respectively in equation (7) in the paper. “s1”, “s3”, “s1mean”, “s3mean” are \(x\), \(y\), \(\bar{x}\), \(\bar{y}\) in equation (7). “F1n”, “F2n”, “F1mean”, “F2mean” are also \(x\), \(y\), \(\bar{x}\), \(\bar{y}\) in equation (7).

join.df =
  left_join(orig.df, word.df, group=c("speaker", "Stem", "Stem_vowel"))

We now calculate Euclidean distances in the stem (“edist”) and Euclidean distances in the suffix formants (“fdist”). These distances are \(d\)_s.w.j.k in equation (7) in the paper.

# create function:
euc = 
  function(a, b) {
    sqrt(sum((a - b)^2))
  }

join.df %<>%
  rowwise() %>%
  mutate(edist = euc(c(s1, s3), c(s1mean, s3mean))) %>%
  mutate(fdist = euc(c(F1n, F2n), c(F1mean, F2mean))) %>%
  ungroup()

We only want to retain those distances when suffix vowels are different, i.e. exclude e.g. distance calculations of e.g. ‘boni’ to the mean of ‘boni’.

other.df =
  join.df %>% filter(Suffix_vowel != Suffix_vowel2)

The violin plots of log. edist. below show that there is progressively more information in the stem from MM to the West to the East (Fig. 14 in the paper).

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols=c("brown1", "chartreuse3", "deepskyblue3")
legend_title <- "Region"

a=other.df %>%
  ggplot +
  aes(y = log(edist), x = Region, fill=Region) +
  #aes(y = edist, x = Suffix_vowel, fill=Suffix_vowel) +
  geom_violin() +
  ylim(-4.5, 2)+
  facet_grid(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  ylab(expression(d["stem"])) +
  xlab("") +
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme_light()+
  theme(text = element_text(size = 16), legend.position = "top",
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))


b=other.df %>%
  ggplot +
  aes(y = log(fdist), x = Region, fill=Region) +
  #aes(y = fdist, x = Suffix_vowel, fill=Suffix_vowel) +
  geom_violin() +
  ylim(-4, 2)+
  facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  ylab(expression(d["suffix"])) +
  xlab("Region") +
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  scale_fill_manual(legend_title, values = cols) +
  theme_light()+
  theme(text = element_text(size = 16), legend.position = "none", 
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))

grid.arrange(a, b, nrow=2)

We now need to reduce the number of levels in the Suffix_vowel-Suffix_vowel2 combinations. We do this by treating a distance a to b and b to a as the same.

h = with(other.df, paste0(as.character(Suffix_vowel), as.character(Suffix_vowel2)))

Thus for example, “ea” includes distances of ‘bone’ tokens to aggregated ‘bona’, as well as distances of ‘bona’ tokens to aggregated ‘bone’

h[h=="ae"] = "ea"
h[h=="ai"] = "ia"
h[h=="ei"] = "ie"

# the above reduces everything to 3 levels
table(h)

## h
##   ea   ia   ie 
##  788 1311 1165

other.df = data.frame(other.df, H = factor(h))

# convert fdist and edist to logs to make it easier to read

other.df$edist = log(other.df$edist)
other.df$fdist = log(other.df$fdist)

Statistics

LMER model for stem-/e/:

cor.lmer.e = lmer(edist ~ fdist * Region * H 
                  + (fdist|Stem) + (fdist|speaker), 
                  data = other.df %>% filter(Stem_vowel == "e") )
# F-statistics

anova(cor.lmer.e)

## Type III Analysis of Variance Table with Satterthwaite's method
##                 Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## fdist           0.0973  0.0973     1   24.43  0.1379  0.713568    
## Region         24.5677 12.2839     2   28.26 17.4146 1.179e-05 ***
## H               9.6190  4.8095     2   16.67  6.8183  0.006856 ** 
## fdist:Region    1.4645  0.7322     2   28.97  1.0381  0.366932    
## fdist:H         1.7764  0.8882     2   22.95  1.2592  0.302774    
## Region:H       26.1995  6.5499     4 1412.56  9.2856 2.088e-07 ***
## fdist:Region:H 11.6100  2.9025     4  800.62  4.1148  0.002627 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/e/:

emtrends(cor.lmer.e, pairwise ~ Region|H,  var = 'fdist')

## $emtrends
## H = ea:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM         -0.0737 0.1369 167.2  -0.3440   0.1966
##  West       -0.1860 0.1411 144.7  -0.4649   0.0929
##  East        0.1373 0.0956  36.1  -0.0566   0.3311
## 
## H = ia:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.0913 0.1266  85.1  -0.1603   0.3430
##  West       -0.1230 0.1595 133.9  -0.4385   0.1925
##  East       -0.1989 0.1089  48.6  -0.4178   0.0200
## 
## H = ie:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.1839 0.0903  40.0   0.0014   0.3665
##  West        0.1143 0.1050  57.9  -0.0960   0.3245
##  East       -0.1112 0.0821  37.1  -0.2775   0.0551
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## H = ea:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.1123 0.189 256.1   0.593  0.8239
##  MM - East    -0.2109 0.157 177.7  -1.340  0.3750
##  West - East  -0.3232 0.157 171.1  -2.057  0.1021
## 
## H = ia:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.2143 0.188 172.2   1.138  0.4920
##  MM - East     0.2903 0.157 163.3   1.852  0.1561
##  West - East   0.0760 0.183 175.2   0.416  0.9092
## 
## H = ie:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.0697 0.126  60.5   0.552  0.8460
##  MM - East     0.2951 0.108  50.7   2.725  0.0235
##  West - East   0.2254 0.123  67.7   1.835  0.1659
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

The only significant slope is H = ie for MM, showing a positive trend. This suggests that, for MM, the bigger the separation between suffix /i, e/, the bigger the difference between stem-e in these two contexts. Also, the only significant contrast is between MM and the East, also for H = /i,e/: df = 50.7, t-ratio = 2.7, p = 0.02.

LMER model for stem-/o/:

cor.lmer.o = lmer(edist ~ fdist * Region * H 
                  + (fdist|Stem) + (fdist|speaker), 
                  data = other.df %>% filter(Stem_vowel == "o") )

#F-statistics

anova(cor.lmer.o)

## Type III Analysis of Variance Table with Satterthwaite's method
##                 Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## fdist           2.1042  2.1042     1   31.82  3.5747   0.06780 .  
## Region          8.0332  4.0166     2   35.74  6.8234   0.00309 ** 
## H              16.7843  8.3921     2  252.89 14.2567 1.359e-06 ***
## fdist:Region    0.8464  0.4232     2   29.10  0.7189   0.49574    
## fdist:H         0.2427  0.1214     2   30.28  0.2062   0.81484    
## Region:H       10.5774  2.6443     4 1688.70  4.4923   0.00130 ** 
## fdist:Region:H  4.7315  1.1829     4 1368.88  2.0095   0.09084 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Post-hoc tests for stem-/o/:

emtrends(cor.lmer.o, pairwise ~ Region|H,  var = 'fdist')

## $emtrends
## H = ea:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.2303 0.1697 195.1  -0.1044    0.565
##  West       -0.1191 0.2111 140.1  -0.5364    0.298
##  East        0.2794 0.1601  75.7  -0.0394    0.598
## 
## H = ia:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.1727 0.1060  62.7  -0.0391    0.385
##  West        0.1571 0.1373  67.2  -0.1170    0.431
##  East       -0.0486 0.0908  42.8  -0.2317    0.135
## 
## H = ie:
##  Region fdist.trend     SE    df lower.CL upper.CL
##  MM          0.2411 0.1148  61.0   0.0115    0.471
##  West        0.1259 0.1492  74.8  -0.1713    0.423
##  East        0.0968 0.1140  53.6  -0.1318    0.325
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## H = ea:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.3494 0.252 301.1   1.387  0.3492
##  MM - East    -0.0491 0.204 294.0  -0.241  0.9686
##  West - East  -0.3985 0.232 280.3  -1.716  0.2010
## 
## H = ia:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.0156 0.158  57.1   0.099  0.9946
##  MM - East     0.2213 0.119  46.2   1.853  0.1639
##  West - East   0.2056 0.148  50.8   1.393  0.3521
## 
## H = ie:
##  contrast    estimate    SE    df t.ratio p.value
##  MM - West     0.1151 0.167  67.6   0.691  0.7697
##  MM - East     0.1442 0.135  67.3   1.072  0.5349
##  West - East   0.0291 0.162  72.0   0.180  0.9823
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Here the results are very similar as for stem-/e/ (H = ie shows for MM a positive trend), while there are in this case no significant contrasts.

The following plot confirms the above graphically, i.e. for MM, the bigger the separation between suffix-/i/ and suffix-/e/,the bigger the difference between stem-/e/ (left) and between stem-/o/ (right) in these two contexts.

other.df %>% 
  filter(Region=="MM" & H == "ie") %>%
  ggplot() +
  aes(x = fdist, y = edist) + 
  geom_point(size=2) +
  geom_smooth(method='lm', formula= y~x)+
  facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + 
  ylab(expression(d["stem"])) +
  xlab(expression(d["suffix"])) +
  theme_light()+
  theme(axis.text = element_text(size=16), axis.title.x = element_text(size=18),
        axis.title.y = element_text(size=18), text = element_text(size=24),  legend.title=element_blank(),
        strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"))

APPENDIX E: Comparison between high and metaphonically raised vowels in the East

For this appendix, we used Lobanov-normalised formant values (taken at the vowels’ temporal midpoint) of the East. The dataframe we are using is “D_MZhigh” (MZ=“Mittelzone”, i.e. the East).

To create the plots in Appendix E (Fig. 19), we separate /e/~/i/ from /o/~/u/ stem vowels into two distinct groups.

eMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/i/", "Raised /e/", "Non-raised /e/"))
eMZ$whichvowel<-"/e/~/i/"

oMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/u/", "Raised /o/", "Non-raised /o/"))
oMZ$whichvowel<-"/o/~/u/"

In these plots, we do not include three-syllable words, since the syllable separating stem and suffix vowel might cause raising even if the suffix is a mid or low vowel.

a2 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", 
                                         "donne", "pecore", "pettini", "preti", "topi"))) +
  aes(y = F1n, x= Stem_vowel) + 
  geom_violin() +
  #facet_grid(~whichvowel)+
  theme_light()+ 
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
        axis.title.y = element_text(size=14), text = element_text(size=12)) +
  xlab("") +
  ylab("Normalised F1")

a1 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi")))+
  aes(y = F2n, x= Stem_vowel) + 
  geom_violin() +
  facet_grid(~whichvowel)+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  ylab("Normalised F2") +
  xlab("")+
  theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), 
        strip.text.x = element_text(color = "black", size=12),
        axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none")

b2 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi"))) +
  aes(y = F1n, x= Stem_vowel) + 
  geom_violin() +
  #facet_grid(~whichvowel)+
  theme_light()+ 
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14),
        axis.title.y = element_text(size=14), text = element_text(size=12)) +
  xlab("") +
  ylab("")

b1 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo",
                                         "donne", "pecore", "pettini", "preti", "topi")))+
  aes(y = F2n, x= Stem_vowel) + 
  geom_violin() +
  facet_grid(~whichvowel)+
  theme_light()+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  ylab("") +
  xlab("")+
  theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), 
        strip.text.x = element_text(color = "black", size=12),
        axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none")

one=grid.arrange(a1, a2, nrow=2)

two=grid.arrange(b1, b2, nrow=2)

#remove the objects that you do not need anymore

rm(a1,a2,b1,b2)

Below the violin plos showing metaphonic and non-metaphonic mid vowels compared to lexical high vowels in the East (Fig. 19). Lobanov-normalised higher/lower F1 values correspond to increasing vowel lowering/raising, while normalised higher/lower F2 values indicate increasing vowel fronting/retraction. These plots show that raised (metaphonic) /e, o/ has formant positions similar or even more extreme (i.e. indicating an even more peripheral vowel) than those in lexical /i, u/.

grid.arrange(one, two, nrow=1)

4. Extra analyses (Revisions)

4.1. Is there any trade-off relationship between stem vowel duration and suffix vowel duration?

The plots below compare vowel duration in stem and suffixe vowels between regions and separately by suffix vowel type. If suffix vowel loss were compensated by stem vowel lengthening, then the Eastern region with its high degree of reduction in suffix vowel quality and duration should have greater stem vowel duration than regions like MM: but as the plot below shows, this is evidently not the case.

# we isolate the tokens for which the suffix vowel was phonetically realised.
realised=met.df %>% filter(Suffix == "Realised")

a=ggplot(realised%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
   aes(y = StemDuration, x = Region) +
  geom_violin() + 
  ylim(0, 400)+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  facet_grid(~Suffix_vowel) +
  theme(axis.text = element_text(size=10), axis.title.x = element_text(size=10), 
        axis.title.y = element_text(size=12),
        text = element_text(size=12), legend.position = "top") +
  ylab("Stem vowel duration (ms)")  + xlab ("Region") 

b=ggplot(realised%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) +
  aes(y = SuffDuration, x = Region) +
  geom_violin() + 
  ylim(0, 400)+
  stat_summary(fun.data=mean_sdl,  
               geom="pointrange")+
  facet_grid(~Suffix_vowel) +
  theme(axis.text = element_text(size=10), axis.title.x = element_text(size=10), 
        axis.title.y = element_text(size=12),
        text = element_text(size=12), legend.position = "top") +
  ylab("Suffix vowel duration (ms)")  + xlab ("Region") 

grid.arrange(a, b, nrow=2)

4.2. Are there more (or less) deleted suffix vowels after specific consonants?

The plot below shows the proportion of final vowel deletion according to the type of preceding consonant, separately by region and stem vowel. Some consonants were grouped for convenience into categories: “rN” = /r/ + nasal, either /n/ or /m/; “nC” = nasal + stop, “ll” = geminate lateral; “CC” = geminate stop (/pp, kk, tt/ etc); “C” = singleton stop like /p, t, k/; “Affr.” = affricates /ddʒ, tts, tʃ/.

Stem_vowel.labs <- c("/e/","/o/")
names(Stem_vowel.labs) <- c("e","o")

cols = c("black", "lightblue")
ggplot(met.df) + aes(fill = Suffix, x = Consonant) +
  geom_bar(position="fill") + 
  facet_grid(Region ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

The following plots group instead specific consonant classes.

The bar charts below show the proportion of final vowel deletion according to the sonority type of preceding consonant clusters, separately by region. Also affricates were considered here as clusters. These plots show that, in our data, vowel deletion after clusters with a falling sonority is slightly greater.

ggplot(met.df %>% filter(ClusterSonority %in% c("rising","falling"))) + aes(fill = Suffix, x = ClusterSonority) +
  geom_bar(position="fill") + 
  facet_grid(~Region) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

The figure below shows slightly more vowel deletion after geminate stops (/pp, tt, kk/) than after singleton ones (/p, t, k, d/).

ggplot(met.df %>% filter(Stops %in% c("singleton","geminate"))) + aes(fill = Suffix, x = Stops) +
  geom_bar(position="fill") + 
  facet_grid(~Region) +
  theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), 
        axis.title.y = element_text(size=16),
        text = element_text(size=16), legend.position = "top") +
  ylab("Proportion")  + xlab ("Region") + scale_fill_manual(values = cols)

4.3. Analysis of differences between etymologically Latin long (Proto-Romance mid-high) and Latin short (Proto-Romance mid-low) vowels

We add in information about whether the stem derives historically from a long or short vowel:

stems.long.e = c("mes", "femmin", "mel",  "stell")    
stems.long.o = c("nipot",   "sol", "spos", "soritS")
stems.long = c(stems.long.e, stems.long.o)

met.df = met.df %>%
  mutate(Length =
           case_when(Stem %in%  stems.long ~ "L",
                     TRUE ~ "S"))

s₁

Stem-/e/

Two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. There is a not-quite significant difference between the models: \(\chi^2\)[3] = 6.5088, p = 0.08931. The not quite significant difference happens because there is a Region * Length interaction.

# Model with Length
e.s1.lmer1 = met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
e.s1.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# compare: Length has a marginal but non-significant effect
anova(e.s1.lmer1, e.s1.lmer2)

## Data: .
## Models:
## e.s1.lmer2: s1 ~ Suffix_vowel * Region + (Region | Stem) + (Suffix_vowel | speaker)
## e.s1.lmer1: s1 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (Suffix_vowel | speaker)
##            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## e.s1.lmer2   29 3623.6 3795.2 -1782.8   3565.6                       
## e.s1.lmer1   32 3623.0 3812.5 -1779.5   3559.0 6.5088  3    0.08931 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(e.s1.lmer1)

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        50.962 16.9872     3 87.182 89.2785 < 2.2e-16 ***
## Region               1.485  0.7425     2 43.636  3.9024   0.02759 *  
## Length               0.019  0.0189     1 25.923  0.0994   0.75511    
## Suffix_vowel:Region 25.776  4.2960     6 78.155 22.5783 2.934e-15 ***
## Region:Length        1.330  0.6648     2 26.776  3.4937   0.04486 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(e.s1.lmer2)

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        51.010 17.0033     3 87.663 89.4134 < 2.2e-16 ***
## Region               3.692  1.8462     2 49.726  9.7086 0.0002757 ***
## Suffix_vowel:Region 27.486  4.5810     6 77.864 24.0898 6.375e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The post-hoc tests for Region * Length show that MM has higher s₁ than the West and higher s₁ than the East on the short, but not the long vowels.

emmeans(e.s1.lmer1, pairwise ~ Length | Region)$contrasts

## Region = MM:
##  contrast estimate    SE   df t.ratio p.value
##  L - S      -0.140 0.199 26.8  -0.706  0.4864
## 
## Region = West:
##  contrast estimate    SE   df t.ratio p.value
##  L - S      -0.167 0.201 28.9  -0.833  0.4116
## 
## Region = East:
##  contrast estimate    SE   df t.ratio p.value
##  L - S       0.516 0.354 29.0   1.458  0.1555
## 
## Results are averaged over the levels of: Suffix_vowel 
## Degrees-of-freedom method: kenward-roger

emmeans(e.s1.lmer1, pairwise ~ Region | Length)$contrasts

## Length = L:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West      0.277 0.1491 47.9   1.859  0.1619
##  MM - East     -0.186 0.2463 31.3  -0.754  0.7337
##  West - East   -0.463 0.2896 32.6  -1.598  0.2610
## 
## Length = S:
##  contrast    estimate     SE   df t.ratio p.value
##  MM - West      0.250 0.0851 46.7   2.939  0.0139
##  MM - East      0.471 0.1040 45.7   4.525  0.0001
##  West - East    0.221 0.1206 42.5   1.829  0.1726
## 
## Results are averaged over the levels of: Suffix_vowel 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Stem-/o/

Also for stem-/o/, two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The difference between the models is again non-significant.

# Model with length
o.s1.lmer1 = met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +  Length * Region +
         (1|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
o.s1.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s1 ~ Suffix_vowel * Region +
         (1|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare: Length has no effect
anova(o.s1.lmer1, o.s1.lmer2)

## Data: .
## Models:
## o.s1.lmer2: s1 ~ Suffix_vowel * Region + (1 | Stem) + (Suffix_vowel | speaker)
## o.s1.lmer1: s1 ~ Suffix_vowel * Region + Length * Region + (1 | Stem) + (Suffix_vowel | speaker)
##            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## o.s1.lmer2   24 3683.0 3823.9 -1817.5   3635.0                     
## o.s1.lmer1   27 3686.8 3845.3 -1816.4   3632.8 2.2652  3     0.5192

anova(o.s1.lmer1)

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        27.2114  9.0705     3   57.75 42.4804 1.253e-14 ***
## Region               2.9238  1.4619     2   42.67  6.8466  0.002637 ** 
## Length               0.3778  0.3778     1   26.27  1.7692  0.194918    
## Suffix_vowel:Region 10.2158  1.7026     6   48.16  7.9741 5.372e-06 ***
## Region:Length        0.0908  0.0454     2 2509.40  0.2125  0.808560    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(o.s1.lmer2)

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        27.362  9.1205     3 57.289 42.7442 1.235e-14 ***
## Region               3.048  1.5240     2 32.656  7.1423  0.002673 ** 
## Suffix_vowel:Region 10.360  1.7266     6 44.311  8.0918 6.286e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

s₃

Stem-/e/

Two models are run, one that includes ‘Length’ as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The results do not how any significant difference (\(\chi^2\)[3] = 1.5351, p = 0.6742).

# Model with Length
e.s3.lmer1 = met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
e.s3.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "e") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (Suffix_vowel|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare: 
anova(e.s3.lmer1, e.s3.lmer2)

## Data: .
## Models:
## e.s3.lmer2: s3 ~ Suffix_vowel * Region + (Region | Stem) + (Suffix_vowel | speaker)
## e.s3.lmer1: s3 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (Suffix_vowel | speaker)
##            npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## e.s3.lmer2   29 -2031.9 -1860.2 1045.0  -2089.9                     
## e.s3.lmer1   32 -2027.5 -1838.0 1045.7  -2091.4 1.5351  3     0.6742

anova(e.s3.lmer1)

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq  Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.49679 0.165598     3 66.242  6.9580 0.0003854 ***
## Region              0.11402 0.057012     2 53.447  2.3955 0.1008467    
## Length              0.00522 0.005219     1 27.626  0.2193 0.6432602    
## Suffix_vowel:Region 0.92288 0.153813     6 60.217  6.4628 2.676e-05 ***
## Region:Length       0.01772 0.008861     2 26.126  0.3723 0.6927437    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(e.s3.lmer2)

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq  Mean Sq NumDF  DenDF F value    Pr(>F)    
## Suffix_vowel        0.49021 0.163405     3 65.861  6.8654 0.0004289 ***
## Region              0.14693 0.073466     2 48.910  3.0867 0.0546474 .  
## Suffix_vowel:Region 0.92972 0.154954     6 59.504  6.5104  2.55e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The factor ‘Length’ has no effect.

Stem-/o/

A similar comparison was made for /o/ stems.

# Model with Length
o.s3.lmer1 = met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +  Length * Region +
         (Region|Stem) + 
         (1|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 

# Model without Length
o.s3.lmer2 = 
  met.df %>%
  filter(Stem_vowel == "o") %>%
  mutate(Stem = factor(Stem)) %>%
  lmer(s3 ~ Suffix_vowel * Region +
         (Region|Stem) + 
         (1|speaker),
       data = .,
       control=lmerControl(check.conv.singular = 
                             .makeCC(action = "ignore",
                                     tol = 1e-4))) 
# compare:
anova(o.s3.lmer1, o.s3.lmer2)

## Data: .
## Models:
## o.s3.lmer2: s3 ~ Suffix_vowel * Region + (Region | Stem) + (1 | speaker)
## o.s3.lmer1: s3 ~ Suffix_vowel * Region + Length * Region + (Region | Stem) + (1 | speaker)
##            npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## o.s3.lmer2   20 -2652.3 -2534.9 1346.2  -2692.3                     
## o.s3.lmer1   23 -2649.3 -2514.3 1347.7  -2695.3 2.9965  3     0.3922

anova(o.s3.lmer1)

## Type III Analysis of Variance Table with Satterthwaite's method
##                      Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.03647 0.34549     3 2512.76 18.0373 1.401e-11 ***
## Region              0.04517 0.02259     2   42.02  1.1792    0.3175    
## Length              0.00003 0.00003     1   25.95  0.0018    0.9667    
## Suffix_vowel:Region 2.01447 0.33574     6  598.03 17.5286 < 2.2e-16 ***
## Region:Length       0.05684 0.02842     2   24.66  1.4836    0.2464    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(o.s3.lmer2)

## Type III Analysis of Variance Table with Satterthwaite's method
##                     Sum Sq Mean Sq NumDF   DenDF F value    Pr(>F)    
## Suffix_vowel        1.0308 0.34360     3 2504.73 17.9279 1.642e-11 ***
## Region              0.1937 0.09685     2   42.72  5.0533   0.01072 *  
## Suffix_vowel:Region 2.0332 0.33886     6  465.58 17.6807 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The factor ‘Length’ has once again no effect.

Supplementary materials: The relationship between the coarticulatory source and effect in sound change: evidence from Italo-Romance metaphony in the Lausberg area.

Pia Greca, Michele Gubian, Jonathan Harrington

1. Preliminaries

2. Acoustic analysis of stem vowels

2.1. Formant trajectory shapes (plots)

Principal components for stem-/e/:

Principal components for stem-/o/:

2.2. Regional variation

PC-score 1 (s1), stem-/e/

PC-score 1 (s1), stem-/o/:

PC-score 3 (s3), stem-/e/:

PC-score 3 (s3), stem-/o/:

Statistics

Post-hoc tests for stem-/e/: s1, differences between regions

Post-hoc tests for stem-/e/: s3, differences between regions

Post-hoc tests for stem-/o/: s1, differences between regions

Post-hoc tests for stem-/o/: s3, differences between regions

Reconstructed formants from emmeans (Figs. 7 and 10)

Stem-/e/

Stem-/e/

3. Analysis of suffix erosion

3.1. Suffix deletion

Statistics

Post-hoc tests: differences between suffixes

Post-hoc tests: differences between regions

3.2. Suffix centralisation

Statistics

Post-hoc tests: differences between regions

Post-hoc tests: differences between between suffixes

3.3. Correlation between suffix centralisation and metaphony within regions

Statistics

LMER model for stem-/e/:

Post-hoc tests for stem-/e/:

LMER model for stem-/o/:

Post-hoc tests for stem-/o/:

APPENDIX E: Comparison between high and metaphonically raised vowels in the East

4. Extra analyses (Revisions)

4.1. Is there any trade-off relationship between stem vowel duration and suffix vowel duration?

4.2. Are there more (or less) deleted suffix vowels after specific consonants?

4.3. Analysis of differences between etymologically Latin long (Proto-Romance mid-high) and Latin short (Proto-Romance mid-low) vowels

s1

Stem-/e/

Stem-/o/

s3

Stem-/e/

Stem-/o/

PC-score 1 (s₁), stem-/e/

PC-score 1 (s₁), stem-/o/:

PC-score 3 (s₃), stem-/e/:

PC-score 3 (s₃), stem-/o/:

Post-hoc tests for stem-/e/: s₁, differences between regions

Post-hoc tests for stem-/e/: s₃, differences between regions

Post-hoc tests for stem-/o/: s₁, differences between regions

Post-hoc tests for stem-/o/: s₃, differences between regions

s₁

s₃