--- title: "Supplementary materials: *The relationship between the coarticulatory source and effect in sound change: evidence from Italo-Romance metaphony in the Lausberg area*." author: "Pia Greca, Michele Gubian, Jonathan Harrington" output: bookdown::html_document2: number_sections: FALSE toc: true theme: flatly highlight: pygments editor_options: markdown: wrap: 72 --- You can access the material used for this study [here](https://data.ub.uni-muenchen.de/363/). In this online repository, we included all data and the script used to generate all analyses included both in the paper and for the revisions. This document presents these analyses in a more readable format. ```{r, echo=FALSE} setwd("//phonetik.uni-muenchen.de/vdata/ERC2/Italian/CueTradingPaper/Greca_etal_2023_data") ``` # 1. Preliminaries ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, results='hide'} library(data.table) library(tidyverse) library(magrittr) library(lme4) library(lmerTest) library(emmeans) library(fda) library(gridExtra) load("data.RData") ``` The loaded .RData contains all data needed for the analyses and plots: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ls() ``` 'met.df' is the main dataframe including data for both stem vowels and both deleted and realised suffix vowels. 'ci.df' is the dataframe that includes only tokens with phonetically realised suffix vowels. 'e.df' and 'o.df' are dataframes including data for stem-/e/ and stem-/o/ respectively. 'D_MZhigh' is a dataframe of mid and high stem vowel tokens produced by the speakers from the East. 'D.pcafd.e' and 'D.pcafd.o' are functional data objects necessary for plotting Principal Components curves. The FPCA-based analysis follows the procedure exemplified in the scripts by M. Gubian available at [this GitHub repository](https://github.com/uasolo/FPCA-phonetics-workshop). # 2. Acoustic analysis of stem vowels ## 2.1. Formant trajectory shapes (plots) The following code lines refer to Figs. 4 and 5 showing the first three Principal Components for stem-/e/ and stem-/o/ separately. Each panel isolates the effect of one PC, say PC*k*, by displaying several colour-coded curves, each one obtained by substituting a different value of the corresponding score *s*~k~ into equations (2a) and (2b) (see paper), setting all other scores to zero. The value *s*~k~ = 0 corresponds to the mean curve across the entire data for that vowel (thick black lines), and is therefore the same across panels of the same stem vowel and formant. ### Principal components for stem-/e/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} tx <- seq(0, 1, length.out = 35) curves <- CJ(time = tx, PC = 1:3, Formant = 1:2, perturbation = seq(-1, 1, by=.25)) e.df%>% setDT() scores.sd.e <- e.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric curves %>% .[, value := (D.pcafd.e$meanfd$coefs[, 1, Formant] + perturbation * scores.sd.e[PC] * D.pcafd.e$harmonics$coefs[, PC, Formant]) %>% fd(D.pcafd.e$meanfd$basis) %>% eval.fd(tx, .), by = .(PC, Formant, perturbation)] curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top PC_labeller <- as_labeller(function(x) paste0('PC', x)) Formant_labeller <- as_labeller(function(x) paste0('F', x)) ggplot(curves) + aes(x = time, y = value, group = perturbation, color = perturbation) + geom_line() + scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") + facet_grid(Formant ~ PC, scales = "free_y", labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) + labs(color = expression(s[k]/sigma[k])) + geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) + xlab("Normalised time") + ylab("Normalised frequency") + theme_light() + theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "bottom") ``` ### Principal components for stem-/o/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} curves <- CJ(time = tx, PC = 1:3, Formant = 1:2, perturbation = seq(-1, 1, by=.25) ) o.df %>% setDT scores.sd.o <- o.df[, lapply(.SD, sd), .SDcols = str_c('s', 1:3)] %>% as.numeric curves %>% .[, value := (D.pcafd.o$meanfd$coefs[, 1, Formant] + perturbation * scores.sd.o[PC] * D.pcafd.o$harmonics$coefs[, PC, Formant]) %>% fd(D.pcafd.o$meanfd$basis) %>% eval.fd(tx, .), by = .(PC, Formant, perturbation)] curves[, Formant := factor(Formant, levels = 2:1)] # make F2 appear on top PC_labeller <- as_labeller(function(x) paste0('PC', x)) Formant_labeller <- as_labeller(function(x) paste0('F', x)) ggplot(curves) + aes(x = time, y = value, group = perturbation, color = perturbation) + geom_line() + scale_color_gradient2(low = "blue", mid = "grey", high = "orangered") + facet_grid(Formant ~ PC, scales = "free_y", labeller = labeller(PC = PC_labeller, Formant = Formant_labeller)) + labs(color = expression(s[k]/sigma[k])) + geom_line(data = curves[perturbation == 0], color = 'black', size = 1.5) + xlab("Normalised time") + ylab("Normalised frequency") + theme_light() + theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "bottom") ``` For both stem vowels, PC1 is associated with simultaneous variations in phonetic height and frontness/backness either between high front and low-mid front in the case of /e/, or between high back and low-mid back in the case of /o/. The phonetic interpretation of PC2, instead, might be related to a variation in lip rounding for /e/ and in phonetic backness for /o/. PC3 for both /e/ and /o/ encode variations between phonetically closing and opening diphthongs. ## 2.2. Regional variation Below you can find the code used to generate the violin plots shown in Figs. 5, 6, 8, 9. These show the distribution of PC-score values separately by region and suffix vowel. ### PC-score 1 (*s*~1~), stem-/e/ Higher *s*~1~ values are associated with increasing vowel lowering, while lower *s*~1~ values correspond to increasing vowel raising. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} cols.curves = c("red", "darkgrey", "orange","darkgreen") ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(Suffix_vowel, s1, fill = Suffix_vowel) + geom_violin() + facet_grid(. ~ Region) + ylab(expression(s[1]))+ theme_light()+ theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "top")+ xlab("Suffix vowel")+ scale_fill_manual(values = cols.curves, name = "Suffix vowel")+ stat_summary(fun.data=mean_sdl, geom="pointrange") ``` ### PC-score 1 (*s*~1~), stem-/o/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ggplot(o.df %>% filter (s1 < 2.9) %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(Suffix_vowel, s1, fill = Suffix_vowel) + geom_violin() + facet_grid(. ~ Region) + ylab(expression(s[1]))+ theme_light()+ theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "top")+ xlab("Suffix vowel")+ scale_fill_manual(values = cols.curves, name = "Suffix vowel")+ stat_summary(fun.data=mean_sdl, geom="pointrange") ``` ### PC-score 3 (*s*~3~), stem-/e/: Higher *s*~3~ values suggest greater opening diphthongisation. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ggplot(e.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(Suffix_vowel, s3, fill = Suffix_vowel) + geom_violin() + facet_grid(. ~ Region) + ylab(expression(s[3]))+ theme_light()+ theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "top")+ xlab("Suffix vowel")+ scale_fill_manual(values = cols.curves, name = "Suffix vowel")+ stat_summary(fun.data=mean_sdl, geom="pointrange") ``` ### PC-score 3 (*s*~3~), stem-/o/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ggplot(o.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(Suffix_vowel, s3, fill = Suffix_vowel) + geom_violin() + facet_grid(. ~ Region) + ylab(expression(s[3]))+ theme_light()+ theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "top")+ xlab("Suffix vowel")+ scale_fill_manual(values = cols.curves, name = "Suffix vowel")+ stat_summary(fun.data=mean_sdl, geom="pointrange") ``` ### Statistics The following LMER models analyse *s*~1~, *s*~2~ (not analysed in the paper), and *s*~3~ in stem-/e/ data. The results show a significant influence on *s*~1~ of the suffix vowel, of region, and a significant interaction between these factors. The results of the mixed model with *s*~3~ as the dependent variable show a significant influence of the suffix, a not quite significant influence of region, and a significant interaction between these factors. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m.e <- list() m.e[[1]] <- lmer(s1 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1|speaker), data = e.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) m.e[[2]] <- lmer(s2 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1|speaker), data = e.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) m.e[[3]] <- lmer(s3 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1|speaker), data = e.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # F-statistics: anova(m.e[[1]]) anova(m.e[[3]]) ``` #### Post-hoc tests for stem-/e/: *s*~1~, differences between regions The post-hoc tests show significant differences between all pairs of regions for suffix-/i/and for suffix-/u/, but no differences between the regions for suffix-/a/, and only one pairwise difference (MM vs West) for suffix-/e/. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m1.e <- m.e[[1]] emmeans(m1.e, pairwise ~ Region | Suffix_vowel)$contrasts ``` #### Post-hoc tests for stem-/e/: *s*~3~, differences between regions The post-hoc tests show a significant difference between the West and the other two regions for suffix-/i/ and for suffix-/u/. There were no significant differences between any of the regions for suffixes-/e, a/. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m3.e <- m.e[[3]] emmeans(m3.e, pairwise ~ Region | Suffix_vowel)$contrasts ``` The following LMER models analyse *s*~1~, *s*~2~ (not analysed in the paper), and *s*~3~ in stem-/o/ data. The results show for both *s*~1~ and *s*~3~ a significant influence of the suffix vowel, of region, and a significant interaction between these factors. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m.o <- list() m.o[[1]] <- lmer(s1 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1 |speaker), data = o.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) m.o[[2]] <- lmer(s2 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1|speaker), data = e.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) m.o[[3]] <- lmer(s3 ~ Suffix_vowel * Region + (1 + Region|Stem) + (1 |speaker), data = o.df, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # F-statistics: anova(m.o[[1]]) anova(m.o[[3]]) ``` #### Post-hoc tests for stem-/o/: *s*~1~, differences between regions The results show a significant difference between MM and the East in the context of all four suffix vowels. There were significant differences between MM and the West in three suffix vowel contexts but not in /a/. There were differences between the West and the East in the context of suffix-/u/ and suffix-/a/ but not in the context of the other two suffix vowels. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m1.o <- m.o[[1]] emmeans(m1.o, pairwise ~ Region | Suffix_vowel)$contrasts ``` #### Post-hoc tests for stem-/o/: *s*~3~, differences between regions The results show significant differences between the West and the other two regions in the context of suffix-/i/ and suffix-/u/, but not for the other two suffix vowel contexts. There were no significant differences between MM and the West in any contexts. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} m3.o = m.o[[3]] emmeans(m3.o, pairwise ~ Region | Suffix_vowel)$contrasts ``` ### Reconstructed formants from emmeans (Figs. 7 and 10) The results presented above show that suffix vowels influenced the phonetic height of stem vowels. This effect is most clearly seen in the reconstructed formant plots for F1 in MM and the East in Figs. 7 and 10, in which the stem vowels – especially for the East – had a progressively higher F1 in the context of suffix vowels. #### Stem-/e/ Preparatory data for /e/-reconstruction: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emm.e <- lapply(m.e, function(m_) { emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>% as.data.table }) METSuffVowel <- emm.e[[1]][, .(Suffix_vowel)] %>% unique # join it to all tables in emm.e lapply(emm.e, function(emm_) {emm_[METSuffVowel, on = "Suffix_vowel", Suffix_vowel := i.Suffix_vowel]}) tx.e <- seq(0, 1, length.out = 35) curves.e <- CJ(time = tx.e, Formant = 1:2, Vowel_ = c("a", "e", "i", "u"), Region_ = c("MM", "East", "West") ) curves.e[, value := (D.pcafd.e$meanfd$coefs[,1,Formant] + sapply(c(1, 3), function(PC) { (emm.e[[PC]] %>% .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>% as.numeric) * D.pcafd.e$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>% fd(., D.pcafd.e$meanfd$basis) %>% eval.fd(tx.e, .), by = .(Formant, Vowel_, Region_)] curves.e[, Region_ := factor (Region_, levels = c("MM", "West", "East"))] # change 1, 2 to F1, F2 curves.e[, Formant := paste0("F", Formant)] # order formants F2, F1, so that F2 is on top in panels (optional) curves.e[, Formant := factor(Formant, levels = c("F2", "F1"))] # order vowels e i a u to make use of Paired palette, # i.e. darker = metaphony, base color = backness (optional) curves.e[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))] ``` Plot of reconstructed stem-/e/ formants: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} cols.curves = c("red", "darkgrey", "orange", "darkgreen") ggplot(curves.e) + aes(x = time, col = Vowel_, group = Vowel_) + geom_line(aes(y = value), size=1.3) + facet_grid(Formant ~ Region_) + scale_colour_manual(values = cols.curves) + theme_light()+ theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), axis.title.y = element_text(size=14), text = element_text(size=16), legend.title=element_blank(), legend.position = "top") + xlab("Normalised time") + ylab("Normalised frequency") ``` #### Stem-/e/ Preparatory data for /o/-reconstruction: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emm.o <- lapply(m.o, function(m_) { emmeans(m_, pairwise ~ Suffix_vowel | Region)$emmeans %>% as.data.table }) METSuffVowel <- emm.o[[1]][, .(Suffix_vowel)] %>% unique # join it to all tables in emm.e lapply(emm.o, function(emm_) {emm_[METSuffVowel, on = "Suffix_vowel", Suffix_vowel := i.Suffix_vowel]}) tx.o <- seq(0, 1, length.out = 35) curves.o <- CJ(time = tx.o, Formant = 1:2, Vowel_ = c("a", "e", "i", "u"), Region_ = c("MM", "East", "West") ) curves.o[, value := (D.pcafd.o$meanfd$coefs[,1,Formant] + sapply(c(1, 3), function(PC) { (emm.o[[PC]] %>% .[Suffix_vowel == Vowel_ & Region == Region_, emmean] %>% as.numeric) * D.pcafd.o$harmonics$coefs[,PC, Formant]}) %>% apply(1, sum)) %>% fd(., D.pcafd.o$meanfd$basis) %>% eval.fd(tx.o, .), by = .(Formant, Vowel_, Region_)] curves.o[, Region_ := factor (Region_, levels = c("MM", "West", "East"))] # change 1, 2 to F1, F2 curves.o[, Formant := paste0("F", Formant)] # order formants F2, F1, so that F2 is on top in panels (optional) curves.o[, Formant := factor(Formant, levels = c("F2", "F1"))] # order vowels e i a u to make use of Paired palette, curves.o[, Vowel_ := factor(Vowel_, levels = c("a", "e", "i", "u"))] ``` Plot of reconstructed stem-/o/ formants: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} cols.curves = c("red", "darkgrey", "orange", "darkgreen") ggplot(curves.o) + aes(x = time, col = Vowel_, group = Vowel_) + geom_line(aes(y = value), size=1.3) + facet_grid(Formant ~ Region_) + scale_colour_manual(values = cols.curves) + theme_light()+ theme(axis.text = element_text(size=14), axis.title.x = element_text(size=14), strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), axis.title.y = element_text(size=14), text = element_text(size=16), legend.title=element_blank(), legend.position = "top") + xlab("Normalised time") + ylab("Normalised frequency") ``` # 3. Analysis of suffix erosion ## 3.1. Suffix deletion Plot of proportions of deleted vs realised suffix vowels, separately by region, stem vowel, and suffix vowel (Fig. 12): ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} Stem_vowel.labs <- c("/e/","/o/") names(Stem_vowel.labs) <- c("e","o") cols = c("black", "lightblue") ggplot(met.df%>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(fill = Suffix, x = Region) + geom_bar(position="fill") + facet_grid(Suffix_vowel ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), axis.title.y = element_text(size=16), text = element_text(size=16), legend.position = "top") + ylab("Proportion") + xlab ("Region") + scale_fill_manual(values = cols) ``` This shows that the extent of suffix deletion was greater in the East, least for MM, and with the West between the two. ### Statistics This is the GLMER model testing for significance of differences observed in the plot above. An extra column, `suffix_del`, is defined carrying suffix deletion as a logical variable (two levels: True/False). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} suffix.glmer = glmer(suffix_del ~ Region + Suffix_vowel + Stem_vowel + (1 | speaker) + (0 + Region | Stem) + (1|Stem) + Region:Suffix_vowel + Region:Stem_vowel, family = binomial, control=glmerControl(optimizer="bobyqa"), data = met.df %>% mutate(suffix_del = is.na(ci))) # F-statistics: joint_tests(suffix.glmer) ``` The GLMM analysis confirms that the degree of suffix deletion was significantly influenced by both region and suffix vowel. #### Post-hoc tests: differences between suffixes The only region showing differences between suffix vowels is the West, which shows a greater deletion for /i, u/ than for /a/ suffix vowels. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emmeans(suffix.glmer, pairwise ~ Suffix_vowel | Region)$contrasts ``` #### Post-hoc tests: differences between regions These show that MM–East contrasts were significant for all stem-suffix vowel combinations. Conversely, contrasts between either MM and the West or the West and the East were only sporadically significant. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emmeans(suffix.glmer, pairwise ~ Region | Stem_vowel * Suffix_vowel)$contrasts ``` ## 3.2. Suffix centralisation Violin plots showing centralisation (*c*) index values ("ci" in the script) for suffix vowels, separately for the three regions, stem vowel, and suffix vowel type (Fig. 13). Higher values and values around 0 are indicative of greater centralisation. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} Stem_vowel.labs <- c("/e/","/o/") names(Stem_vowel.labs) <- c("e","o") cols=c("brown1", "chartreuse3", "deepskyblue3") legend_title <- "Region" ggplot(ci.df %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(y = ci, x = Region, fill=Region) + geom_violin(trim=F)+ facet_grid(Stem_vowel ~ Suffix_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs))+ ylab("c")+ theme_light()+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ scale_fill_manual(legend_title, values = cols) + theme(strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black"), text = element_text(size = 16), legend.position = "top")+ xlab("Region") ``` ### Statistics The results of the mixed model showed significant influences on the centralisation index of the region and of the suffix vowel. There was also a significant interaction between these two fixed factors, and between region, stem vowel, and suffix vowel. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} ci.lmer = lmer(ci ~ Region * Stem_vowel * Suffix_vowel + (1|Stem) + (1|speaker), data = ci.df) # F-statistics: anova(ci.lmer) ``` #### Post-hoc tests: differences between regions The results show that there was significantly greater centralisation in the suffix vowel for the East than MM for all stem-suffix vowel combinations. There was also greater suffix vowel centralisation for the West than MM for /i, a/-suffixes and for /e/-suffixes preceded by /o/-stems. The extent of suffix vowel centralisation was also greater for the East than the West for /i, u, a/-suffixes. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emm_options(pbkrtest.limit = nrow(ci.df),lmerTest.limit = nrow(ci.df)) emmeans(ci.lmer, pairwise ~ Region | Suffix_vowel * Stem_vowel)$contrasts ``` #### Post-hoc tests: differences between between suffixes Across the three regions, the suffix vowel /e/ was more centralised than /a, i, u/-suffixes. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE,cache=TRUE} emmeans(ci.lmer, pairwise ~ Suffix_vowel)$contrasts ``` ## 3.3. Correlation between suffix centralisation and metaphony within regions This analysis needs a separate dataframe. The following code lines explain step-by-step how this was created. First, we rename "u" in Suffix_vowel as "i" - this is because there are too few tokens for distances between stems with suffix-/e/ and suffix-/u/ to be calculated. "i" stands therefore for a high vowel (either "i" or "u"). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} h = as.character(ci.df$Suffix_vowel) h[h == "u"] = "i" ci.df$Suffix_vowel = factor(h) ``` Then we calculate the mean *s*~1~, mean *s*~3~, mean F1n ("n" = normalised), and mean F2n for each speaker-Stem-Suffix vowel combination. This is $\bar{x}$~*s.w.k*~ and $\bar{y}$~*s.w.k*~ in equation (7) in the paper. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} word.df = ci.df %>% group_by(speaker, Stem, Suffix_vowel, Stem_vowel, Region) %>% summarise(s1mean = mean(s1), s3mean = mean(s3), F1mean = mean(F1n), F2mean = mean(F2n)) %>% ungroup() word.df %<>% rename(Suffix_vowel2 = Suffix_vowel) # add a unique identifier word.df = data.frame(word.df, indx = rep(1:nrow(word.df))) ``` We can now create a new dataframe from "ci.df" with the columns "speaker", "Stem", "Suffix_vowel", "Stem_vowel", "s1", "s3", "F1n", "F2n". We also add the column "bundle" as an identifier to keep track of the unique segments. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} orig.df = ci.df %>% dplyr::select(speaker, Stem, Suffix_vowel, Stem_vowel, Region, s1, s3, F1n, F2n, ci, bundle) ``` We then join the two dataframes. If e.g. the lexical stem 'bon' occurs before suffixed -i, -a, -e, then each observation of 'bon' will be repeated 3 times in the new "join.df": once in the context of aggregated *s*~1~ and *s*~3~ in 'boni', once in the context of aggregated *s*~1~ and *s*~3~ 'bone', once in the context of aggregated *s*~1~ and *s*~3~ 'bona'. For this reason, "join.df" has many more observations than the original "ci.df". "Suffix_vowel" and "Suffix_vowel2" are $j$ and $k$ respectively in equation (7) in the paper. "s1", "s3", "s1mean", "s3mean" are $x$, $y$, $\bar{x}$, $\bar{y}$ in equation (7). "F1n", "F2n", "F1mean", "F2mean" are also $x$, $y$, $\bar{x}$, $\bar{y}$ in equation (7). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} join.df = left_join(orig.df, word.df, by=c("speaker", "Stem", "Stem_vowel","Region")) ``` We now calculate Euclidean distances in the stem ("edist") and Euclidean distances in the suffix formants ("fdist"). These distances are $d$~*s.w.j.k*~ in equation (7) in the paper. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} # create function: euc = function(a, b) { sqrt(sum((a - b)^2)) } join.df %<>% rowwise() %>% mutate(edist = euc(c(s1, s3), c(s1mean, s3mean))) %>% mutate(fdist = euc(c(F1n, F2n), c(F1mean, F2mean))) %>% ungroup() ``` We only want to retain those distances when suffix vowels are different, i.e. exclude e.g. distance calculations of e.g. 'boni' to the mean of 'boni'. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} other.df = join.df %>% filter(Suffix_vowel != Suffix_vowel2) ``` The violin plots of log. edist. below show that there is progressively more information in the stem from MM to the West to the East (Fig. 14 in the paper). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} Stem_vowel.labs <- c("/e/","/o/") names(Stem_vowel.labs) <- c("e","o") cols=c("brown1", "chartreuse3", "deepskyblue3") legend_title <- "Region" a=other.df %>% ggplot + aes(y = log(edist), x = Region, fill=Region) + #aes(y = edist, x = Suffix_vowel, fill=Suffix_vowel) + geom_violin() + ylim(-4.5, 2)+ facet_grid(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + ylab(expression(d["stem"])) + xlab("") + stat_summary(fun.data=mean_sdl, geom="pointrange")+ scale_fill_manual(legend_title, values = cols) + theme_light()+ theme(text = element_text(size = 16), legend.position = "top", strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black")) b=other.df %>% ggplot + aes(y = log(fdist), x = Region, fill=Region) + #aes(y = fdist, x = Suffix_vowel, fill=Suffix_vowel) + geom_violin() + ylim(-4, 2)+ facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + ylab(expression(d["suffix"])) + xlab("Region") + stat_summary(fun.data=mean_sdl, geom="pointrange")+ scale_fill_manual(legend_title, values = cols) + theme_light()+ theme(text = element_text(size = 16), legend.position = "none", strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black")) grid.arrange(a, b, nrow=2) ``` We now need to reduce the number of levels in the Suffix_vowel-Suffix_vowel2 combinations. We do this by treating a distance *a* to *b* and *b* to *a* as the same. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} h = with(other.df, paste0(as.character(Suffix_vowel), as.character(Suffix_vowel2))) ``` Thus for example, "ea" includes distances of 'bone' tokens to aggregated 'bona', as well as distances of 'bona' tokens to aggregated 'bone' ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} h[h=="ae"] = "ea" h[h=="ai"] = "ia" h[h=="ei"] = "ie" # the above reduces everything to 3 levels table(h) other.df = data.frame(other.df, H = factor(h)) # convert fdist and edist to logs to make it easier to read other.df$edist = log(other.df$edist) other.df$fdist = log(other.df$fdist) ``` ### Statistics #### LMER model for stem-/e/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} cor.lmer.e = lmer(edist ~ fdist * Region * H + (fdist|Stem) + (fdist|speaker), data = other.df %>% filter(Stem_vowel == "e") ) # F-statistics anova(cor.lmer.e) ``` #### Post-hoc tests for stem-/e/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emtrends(cor.lmer.e, pairwise ~ Region|H, var = 'fdist') ``` The only significant slope is H = ie for MM, showing a positive trend. This suggests that, for MM, the bigger the separation between suffix /i, e/, the bigger the difference between stem-e in these two contexts. Also, the only significant contrast is between MM and the East, also for H = /i,e/: *df* = 50.7, *t*-ratio = 2.7, *p* = 0.02. #### LMER model for stem-/o/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} cor.lmer.o = lmer(edist ~ fdist * Region * H + (fdist|Stem) + (fdist|speaker), data = other.df %>% filter(Stem_vowel == "o") ) #F-statistics anova(cor.lmer.o) ``` #### Post-hoc tests for stem-/o/: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emtrends(cor.lmer.o, pairwise ~ Region|H, var = 'fdist') ``` Here the results are very similar as for stem-/e/ (H = ie shows for MM a positive trend), while there are in this case no significant contrasts. The following plot confirms the above graphically, i.e. for MM, the bigger the separation between suffix-/i/ and suffix-/e/,the bigger the difference between stem-/e/ (left) and between stem-/o/ (right) in these two contexts. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} other.df %>% filter(Region=="MM" & H == "ie") %>% ggplot() + aes(x = fdist, y = edist) + geom_point(size=2) + geom_smooth(method='lm', formula= y~x)+ facet_wrap(~Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + ylab(expression(d["stem"])) + xlab(expression(d["suffix"])) + theme_light()+ theme(axis.text = element_text(size=16), axis.title.x = element_text(size=18), axis.title.y = element_text(size=18), text = element_text(size=24), legend.title=element_blank(), strip.text.x = element_text(color = "black"), strip.text.y = element_text(color = "black")) ``` # APPENDIX E: Absence of relationship between stem vowel duration and suffix vowel duration The plots below compare vowel duration in stem and suffix vowels between regions and separately by suffix vowel type. If suffix vowel loss were compensated by stem vowel lengthening, then the Eastern region with its high degree of reduction in suffix vowel quality and duration should have greater stem vowel duration than regions like MM: but as the plot below shows, this is evidently not the case (Fig. 19). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} # we isolate the tokens for which the suffix vowel was phonetically realised. realised=met.df %>% filter(Suffix == "Realised") cols=c("brown1", "chartreuse3", "deepskyblue3") a=ggplot(realised %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(y = StemDuration, x = Region, fill=Region) + geom_violin() + theme_light()+ ylim(0, 400)+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ scale_fill_manual(values = cols) + facet_grid(~Suffix_vowel) + theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14), legend.position = "top", strip.text.x = element_text(color = "black", size=12), axis.title.y = element_text(size=14), text = element_text(size=16)) + ylab("Stem vowel duration (ms)") + xlab ("") b=ggplot(realised %>% mutate(Suffix_vowel = factor(Suffix_vowel, levels= c("a", "e", "i", "u")))) + aes(y = SuffDuration, x = Region, fill=Region) + geom_violin() + theme_light()+ ylim(0, 400)+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ scale_fill_manual(values = cols) + facet_grid(~Suffix_vowel) + theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14), strip.text.x = element_text(color = "black", size=12), legend.position = "none", axis.title.y = element_text(size=14), text = element_text(size=16)) + ylab("Suffix vowel duration (ms)") + xlab ("Region") grid.arrange(a, b, nrow=2) ``` # APPENDIX F: Comparison between high and metaphonically raised vowels in the East For this appendix, we used Lobanov-normalised formant values (taken at the vowels' temporal midpoint) of the East. The dataframe we are using is "D_MZhigh" (MZ="Mittelzone", i.e. the East). To create the plots in Appendix F (Fig. 20), we separate /e/~/i/ from /o/~/u/ stem vowels into two distinct groups. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} eMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/i/", "Raised /e/", "Non-raised /e/")) eMZ$whichvowel<-"/e/~/i/" oMZ=D_MZhigh %>% filter (Stem_vowel %in% c( "/u/", "Raised /o/", "Non-raised /o/")) oMZ$whichvowel<-"/o/~/u/" ``` In these plots, we do not include three-syllable words, since the syllable separating stem and suffix vowel might cause raising even if the suffix is a mid or low vowel. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, results=FALSE, fig.show='hide'} a2 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", "donne", "pecore", "pettini", "preti", "topi"))) + aes(y = F1n, x= Stem_vowel) + geom_violin() + #facet_grid(~whichvowel)+ theme_light()+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14), axis.title.y = element_text(size=14), text = element_text(size=12)) + xlab("") + ylab("Normalised F1") a1 = ggplot(eMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", "donne", "pecore", "pettini", "preti", "topi")))+ aes(y = F2n, x= Stem_vowel) + geom_violin() + facet_grid(~whichvowel)+ theme_light()+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ ylab("Normalised F2") + xlab("")+ theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), strip.text.x = element_text(color = "black", size=12), axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none") b2 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", "donne", "pecore", "pettini", "preti", "topi"))) + aes(y = F1n, x= Stem_vowel) + geom_violin() + #facet_grid(~whichvowel)+ theme_light()+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ theme(axis.text = element_text(size=12, color="black"), axis.title.x = element_text(size=14), axis.title.y = element_text(size=14), text = element_text(size=12)) + xlab("") + ylab("") b1 = ggplot(oMZ %>% filter (!Word %in% c("donna", "pecora", "pettine", "prete", "topo", "donne", "pecore", "pettini", "preti", "topi")))+ aes(y = F2n, x= Stem_vowel) + geom_violin() + facet_grid(~whichvowel)+ theme_light()+ stat_summary(fun.data=mean_sdl, geom="pointrange")+ ylab("") + xlab("")+ theme(axis.text = element_text(size=12, color="black"), axis.text.x = element_blank(), strip.text.x = element_text(color = "black", size=12), axis.title.y = element_text(size=14), text = element_text(size=12),legend.position="none") one=grid.arrange(a1, a2, nrow=2) two=grid.arrange(b1, b2, nrow=2) #remove the objects that you do not need anymore rm(a1,a2,b1,b2) ``` Below the violin plos showing metaphonic and non-metaphonic mid vowels compared to lexical high vowels in the East (Fig. 19). Lobanov-normalised higher/lower F1 values correspond to increasing vowel lowering/raising, while normalised higher/lower F2 values indicate increasing vowel fronting/retraction. These plots show that raised (metaphonic) /e, o/ has formant positions similar or even more extreme (i.e. indicating an even more peripheral vowel) than those in lexical /i, u/. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} grid.arrange(one, two, nrow=1) ``` # 4. Extra analyses (Revisions) ## 4.1. Are there more (or less) deleted suffix vowels after specific consonants? The plot below shows the proportion of final vowel deletion according to the type of preceding consonant, separately by region and stem vowel. Some consonants were grouped for convenience into categories: "rN" = /r/ + nasal, either /n/ or /m/; "nC" = nasal + stop, "ll" = geminate lateral; "CC" = geminate stop (/pp, kk, tt/ etc); "C" = singleton stop like /p, t, k/; "Affr." = affricates /ddʒ, tts, tʃ/. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} Stem_vowel.labs <- c("/e/","/o/") names(Stem_vowel.labs) <- c("e","o") cols = c("black", "lightblue") ggplot(met.df) + aes(fill = Suffix, x = Consonant) + geom_bar(position="fill") + facet_grid(Region ~ Stem_vowel, labeller=labeller(Stem_vowel=Stem_vowel.labs)) + theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), axis.title.y = element_text(size=16), text = element_text(size=16), legend.position = "top") + ylab("Proportion") + xlab ("Region") + scale_fill_manual(values = cols) ``` The following plots group instead specific consonant classes. The bar charts below show the proportion of final vowel deletion according to the sonority type of preceding consonant clusters, separately by region. Also affricates were considered here as clusters. These plots show that, in our data, vowel deletion after clusters with a falling sonority is slightly greater. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ggplot(met.df %>% filter(ClusterSonority %in% c("rising","falling"))) + aes(fill = Suffix, x = ClusterSonority) + geom_bar(position="fill") + facet_grid(~Region) + theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), axis.title.y = element_text(size=16), text = element_text(size=16), legend.position = "top") + ylab("Proportion") + xlab ("Region") + scale_fill_manual(values = cols) ``` The figure below shows slightly more vowel deletion after geminate stops (/pp, tt, kk/) than after singleton ones (/p, t, k, d/). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} ggplot(met.df %>% filter(Stops %in% c("singleton","geminate"))) + aes(fill = Suffix, x = Stops) + geom_bar(position="fill") + facet_grid(~Region) + theme(axis.text = element_text(size=12), axis.title.x = element_text(size=16), axis.title.y = element_text(size=16), text = element_text(size=16), legend.position = "top") + ylab("Proportion") + xlab ("Region") + scale_fill_manual(values = cols) ``` ## 4.2. Analysis of differences between etymologically Latin long (Proto-Romance mid-high) and Latin short (Proto-Romance mid-low) vowels We add in information about whether the stem derives historically from a long or short vowel: ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE} stems.long.e = c("mes", "femmin", "mel", "stell") stems.long.o = c("nipot", "sol", "spos", "soritS") stems.long = c(stems.long.e, stems.long.o) met.df = met.df %>% mutate(Length = case_when(Stem %in% stems.long ~ "L", TRUE ~ "S")) ``` ### *s*~1~ #### Stem-/e/ Two models are run, one that includes 'Length' as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. There is a not-quite significant difference between the models: $\chi^2$[3] = 6.5088, *p* = 0.08931. The not quite significant difference happens because there is a `Region * Length` interaction. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} # Model with Length e.s1.lmer1 = met.df %>% filter(Stem_vowel == "e") %>% mutate(Stem = factor(Stem)) %>% lmer(s1 ~ Suffix_vowel * Region + Length * Region + (Region|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # Model without Length e.s1.lmer2 = met.df %>% filter(Stem_vowel == "e") %>% mutate(Stem = factor(Stem)) %>% lmer(s1 ~ Suffix_vowel * Region + (Region|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # compare: Length has a marginal but non-significant effect anova(e.s1.lmer1, e.s1.lmer2) anova(e.s1.lmer1) anova(e.s1.lmer2) ``` The post-hoc tests for Region * Length show that MM has higher *s*~1~ than the West and higher *s*~1~ than the East on the short, but not the long vowels. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} emmeans(e.s1.lmer1, pairwise ~ Length | Region)$contrasts emmeans(e.s1.lmer1, pairwise ~ Region | Length)$contrasts ``` #### Stem-/o/ Also for stem-/o/, two models are run, one that includes 'Length' as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The difference between the models is again non-significant. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} # Model with length o.s1.lmer1 = met.df %>% filter(Stem_vowel == "o") %>% mutate(Stem = factor(Stem)) %>% lmer(s1 ~ Suffix_vowel * Region + Length * Region + (1|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # Model without Length o.s1.lmer2 = met.df %>% filter(Stem_vowel == "o") %>% mutate(Stem = factor(Stem)) %>% lmer(s1 ~ Suffix_vowel * Region + (1|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # compare: Length has no effect anova(o.s1.lmer1, o.s1.lmer2) anova(o.s1.lmer1) anova(o.s1.lmer2) ``` ### *s*~3~ #### Stem-/e/ Two models are run, one that includes 'Length' as a fixed factor, and the other does not. A comparison is then made whether these models differ significantly. The results do not how any significant difference ($\chi^2$[3] = 1.5351, p = 0.6742). ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} # Model with Length e.s3.lmer1 = met.df %>% filter(Stem_vowel == "e") %>% mutate(Stem = factor(Stem)) %>% lmer(s3 ~ Suffix_vowel * Region + Length * Region + (Region|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # Model without Length e.s3.lmer2 = met.df %>% filter(Stem_vowel == "e") %>% mutate(Stem = factor(Stem)) %>% lmer(s3 ~ Suffix_vowel * Region + (Region|Stem) + (Suffix_vowel|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # compare: anova(e.s3.lmer1, e.s3.lmer2) anova(e.s3.lmer1) anova(e.s3.lmer2) ``` The factor 'Length' has no effect. #### Stem-/o/ A similar comparison was made for /o/ stems. ```{r, echo=TRUE, warning = FALSE, error=FALSE, message=FALSE, cache=TRUE} # Model with Length o.s3.lmer1 = met.df %>% filter(Stem_vowel == "o") %>% mutate(Stem = factor(Stem)) %>% lmer(s3 ~ Suffix_vowel * Region + Length * Region + (Region|Stem) + (1|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # Model without Length o.s3.lmer2 = met.df %>% filter(Stem_vowel == "o") %>% mutate(Stem = factor(Stem)) %>% lmer(s3 ~ Suffix_vowel * Region + (Region|Stem) + (1|speaker), data = ., control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) # compare: anova(o.s3.lmer1, o.s3.lmer2) anova(o.s3.lmer1) anova(o.s3.lmer2) ``` The factor 'Length' has once again no effect.