Core idea: “Small number statistics” are often blamed (often in a hand-wavy, armchair fashion) for a lack of representation of underrepresented groups on panels, awards, and any other facet of life where a small selection is made. The premise is that zero representation is a common outcome for small selections. While this is true, it is only one part of the truth.
Humans have horrible statistical intuition, and this includes those who have learned enough statistics to know about “small number statistics.” In short, to actually comprehend the expected outcomes for small selections, one must actually look at the expected outcomes.
A beautiful, easy-to-use site, can illustrate these outcomes interactively. We also post a simple MATLAB script below that you can use to get more detailed results. While we focused our article on APS Fellows, this kind of reflective analysis is also useful for panels, invited speakers lists, and anywhere else this kind of selection occurs.
As a null model, imagine randomly selecting n = 100 people from a large population containing 15% women; then repeat this sampling many times to generate good statistics. For such large values of n, the mean number of women in the sample coincides with the most likely value: 15. The distribution is symmetric and the chance of zero is small. Select n = 5, however, and zero becomes the most likely value. Thus small-number statistics are often faulted for a lack of representation of women. Stopping the analysis here, though, ignores the actual numbers. Though zero is the most likely value for n = 5, the median value is one. Zero accounts for only 44% of outcomes, whereas the likelihood of one is 39% and the likelihood of two or more is 17%. Small-number statistics thus create asymmetric and long-tailed distributions. Now consider changing the selection slightly: n =7 and 20% women in the population. Zero is still a common outcome, one is the most likely value, but now selecting two or more women (“overrepresentation”) is twice as likely as not representing them at all.
Numerical Simulation: We used the null model (illustrated in the MATLAB code) to generate probabilities for selections of N women for n fellows slots. To obtain unit populations and percentages of women, we obtained unit demographic data from APS. We obtained the fellowship allocation for each unit in each year from APS as well. The source data is also posted below.
Meta-analysis (scoring): We compared the median value given (W) by the null model to the actual number of women selected (N). If N > W, we test for significant overrepresentation by calculating the fraction of outcomes in the null model below N. If this is >0.55, we deem this “overperformance.” If N < W, we test for signficant underrepresentation by calculating the fraction of outcomes less than or equal to N. If this is <0.45, we deem this “underperformance.” All other results are deemed “neutral” (reasonably corresponding to expectation).
Calculating %Women Eligible: The gender breakdown of student members can skew unit gender statistics. Including students typically overestimates the percentage of women eligible to be fellows, though in some cases the reverse is true. APS does not have the gender breakdown of students prior to January 2018. Therefore, in order to eliminate students from the potential fellow pool, we had to use the 2018 student gender breakdowns for all 3 years. (The actual student and regular member numbers came from the specific years, this is just the ratio of women:men students.) We do not see any reason why this ratio should wildly fluctuate, and the correction for the student effect vastly outweighs any contribution from such a fluctuation. We also eliminated those who are already Fellows from the eligible population, though this had a very minor effect on the outcome.
Logistic Regression: We also performed a logistic regression analysis on 5 years worth of data and obtained similar results. We do not report this analysis here for two reasons. 1) The analysis was performed on data that included the student members, so we do not hold it in full faith. 2) The analysis does not accept inputs for selections of zero women fellows, which is a common outcome.
Code: This is a simple MATLAB script that generate probabilities for a null model. The code should be well-commented but please email firstname.lastname@example.org with any questions. Yes, this script reproduces the binomial distribution by brute force.
Data and Results Spreadsheet:
- Data and Results (read-only)
- Zipped archive contains: 1) Fellowship Allocations, 2) Unit Demographics
Action: If you are interested in naming someone for fellowship, please click here.
For any other comments or questions, please feel free to email me directly: email@example.com
Comments are enabled on this post, and when they contribute to a fair and inclusive dialogue, will be approved.