Summary database

Here is the summary of our 4 tables, and the information we wrangle from them:

Uniq Name in Paper table

Coauthor table

Author table

Training table

People:

Department to Category mapping

Validity assessment of our department to college mapping:


Balancing annotations

To study group size across field of sciences, we would as many faculties with and without resarch groups in each domain. At the moment, we oversampled faculties with research groups because we were interested in characterizing group size.


Data quality check

Author age

Checking author age. most often if above 50 years old it is a glitch in min paper year that needs to be fixed. Right now we are doing it manually.

Coauthor age

Same for coauthors. Since coauthors are numerous, we do nothing to fix them right now. If they are above 50, we impute with NA in the model so that age_diff between author and coauthor is NA. This might not be the greatest solution because this mean we artificially make a cutoff of age diff of 50 years old.


Proportion of each college by author age