230A final

Land quality distribution across the world, as used by the author

This is my final project for STAT230A: Linear Models at UC Berkeley in Spring 2022, done jointly with Isaac Schmidt.

The goal was to replicate the findings of Michalopoulos: The Origins of Ethnolinguistic Diversity. The project took an additional dimension since we couldn’t obtain some of the data used in the paper - the WLMS dataset, prompting us to recreate a portion of the paper using the GREG dataset.

The main finding of the paper, which also bore out in our re-analysis using the alternative dataset, is that the variation in the elevation and in the land quality (see the map above) are the two most decisive factors in driving a region’s ethnolinguistic diversity.

Andrej Leban
Andrej Leban
Ph.D. Student