Data Processing Steps

Identifying the Attribute

Before gathering data, we first decided the targeted variables for each dataset:

Column Name Description
name Full name
gender_guess Gender using gender_guesser
gender Manually corrected gender
placement placement of the candidate
placement_type Type of Placement/td>
academic academic placement, binary variable
private_company private company placement, binary variable
government government placement, binary variable
year Year of completion
university University name
department Department

Data Availablility

This table shows our result with retrieving the relevant data .

Data Cleaning

We first cleaned the data by removing duplicates and applying manual gender corrections. Here is a preview of the cleaned data:

Year Name Placement University Department Placement Type Gender
2024 Jenna Anders University of Virginia Batten Harvard University Economics academic female
2024 Martin Aragoneses INSTEAD Harvard University Economics private_company male
2024 Michael Blank Stanford University, Graduate School of Business Harvard University Economics academic male
2024 Phoebe Cai Link Logistics Real Estate Harvard University Economics private_company female
2024 Romaine Campbell Cornell Brooks Policy School Harvard University Economics academic female
2024 Jiafeng Chen Stanford University Harvard University Economics academic male
2024 Antonio Coran Bank of Italy Harvard University Economics government male
2024 Veronica DeFalco Imperial College of London Harvard University Economics academic female
2024 Pedro Degiovanni Charles River Associates Harvard University Economics private_company male
2024 Michael Droste University of Southern California Harvard University Economics academic male

Tabular Data

Here is a tabular data that shows the placement for each university and year. Including the gender composition.