230A final
This is my final project for STAT230A: Linear Models at UC Berkeley in Spring 2022, done jointly with Isaac Schmidt.
The goal was to replicate the findings of Michalopoulos: The Origins of Ethnolinguistic Diversity. The project took an additional dimension since we couldn’t obtain some of the data used in the paper - the WLMS dataset, prompting us to recreate a portion of the paper using the GREG dataset.
The main finding of the paper, which also bore out in our re-analysis using the alternative dataset, is that the variation in the elevation and in the land quality (see the map above) are the two most decisive factors in driving a region’s ethnolinguistic diversity.