# Purpose of this repository This repository contains the data and code that was used for generating the results presented in the paper "Same-gender citations do not indicate a substantial gender homophily bias". ## Data The file gender_homophily_data_fo.tar.gz is a tarball that contains all data (CSV) files for the Faculty Opinions data. The file gender_homophily_data_wos.tar.gz contains all data (CSV) files for the Web of Science (WoS) data. The tarballs have to be unpacked to access the CSV files. This can be done via command line with the following commands: `tar -xzf gender_homophily_data_fo.tar.gz -C ` `tar -xzf gender_homophily_data_wos.tar.gz -C ` Make sure to replace `` with the path to the folder containing the tarballs and `` with the path to the folder where you want the CSV files to be unpacked. The following list contains all files in the tarballs together with the columns of each file: * focals.csv (data on the level of focal papers from the Faculty Opinions data) * focal_gender: Gender of the focal authors (male -> M, female -> F, mixed -> FM) * share_male_citing: share of male-authored citing papers * FOCAL_PY: publication year * FOCAL_TEAMSIZE: number of authors * avg_eval: average rating provided in the Faculty Opinions data * focal_id: id for each focal paper * focals_sections.csv (keywords provided in the Faculty Opinions data for each focal paper) * focal_id: id for each focal paper * section_id: id for the Faculty Opinions keyword assigned to a paper * focal_pairs_fosims.csv (pairs of focal papers from the Faculty Opinions data) * FOCAL_ID: id for the female-authored paper * FOCAL_ID1: id for the male-authored paper * sim: number of shared Faculty Opinions keywords * citing_m_diff_perc: difference in share of male-authored citing papers * citing_f_diff_perc: difference in share of female-authored citing papers * avg_eval_diff: difference in average quality ratings * age_diff: age difference (difference in publication years) * teamsize_diff: difference in teamsize * focal_pairs_fosims_sc.csv (pairs of focal papers from the Faculty Opinions data; including self-citations) * same columns as in focal_pairs_fosims.csv, except citing_f_diff_perc,avg_eval_diff, age_diff, teamsize_diff * focal_pairs_citedrefsims.csv (pairs of focal papers from the Faculty Opinions data) * sim: number of shared cited references * other columns identical to focal_pairs_fosims_sc.csv * focal_pairs_keywordsims.csv (pairs of focal papers from the Faculty Opinions data) * sim: number of shared WoS keywords * other columns identical to focal_pairs_fosims_sc.csv * focal_pairs_subjsims.csv (pairs of focal papers from the Faculty Opinions data) * sim: number of shared WoS subject categories * other columns identical to focal_pairs_fosims_sc.csv * focal_pairs_fosims_cited.csv (pairs of focal papers from the Faculty Opinions data) * citing_m_diff_perc: difference in share of male-authored cited references * other columns identical to focal_pairs_fosims_sc.csv * focal_pairs_fosims_cgender.csv (pairs of focal papers from the Faculty Opinions data; based on more restrictive gender assignments) * same columns as in focal_pairs_fosims_sc.csv * focal_pairs_abstrsims.csv (pairs of focal papers from the Faculty Opinions data) * paper_f: id for the female-authored paper * paper_m: id for the male-authored paper * sim: cosine similarity between the papers' tf-idfs of their titles and abstracts * citing_m_diff_perc: difference in the share of male-authored citing papers * extreme_shares: flag indicating whether both papers of a pair have 100% female- or male-authored citing papers * wos_focal_pairs_abstrsims.csv (pairs of focal papers from WoS) * same columns as in focal_pairs_abstrsims.csv ## R scripts All R scripts contain a function call of `setwd()` in the first few lines. The argument of this function should be the path to the folder containing the CSV files. This folder should also contain a folder named `res`, where the results of the analyses are saved. ### analyses_focals.R This script produces the figure showing the estimates of the regression analyses on the level of focal papers. The script uses the files `focals.csv` and `focals_sections.csv`. ### analyses_pairs.R This script produces figures showing the results of all analyses of paper pairs. In the first few lines, you can specify which results the script should produce. There are several vectors containing parameters representing the possible settings for which results can be produced. The script then iterates over all possible combinations of parameters. For example, in the vector `fo_data_opts`, the value `T` specifies to produce the results based on the Faculty Opinions data and the value `F` specifies to compute the results based on the WoS data. Changing the corresponding line from `fo_data_opts <- c(T, F)` to `fo_data_opts <- c(T)` would mean to only compute the results based on the Faculty Opinions data (instead of both the Faculty Opinions and the WoS data). This particular adjustment may be helpful because it saves a lot of runtime (the file containing the WoS data is much larger than the other files). ### studies_results.R This script produces the figure showing the overview of results of previous studies on gender homophily in citations.