Mining personal names in the ‘Big Data’ to map Diasporas

Elian Carsenat
NamSor Applied Onomastics

Diasporas: ‘who are they, where are they and what are they doing’ is the perennial question all countries ask about their diaspora. Onomastics (the science of proper names) can combine machine learning and domain expertise from a wide range of disciplines (sociolinguistics, anthopology, geography, history) to create accurate models for personal name classification (by gender, country/region of origin, ethnicity or according to a cast/clanic system). That opens new possibilities to analyse open data and social networks and produce an accurate mapping of Diasporas. We will present an API for classifying personal names and several concrete applications in migration research or Diaspora engagement:

- mapping Diaspora communities : the geo demographics of Boston City;
- ITIF analysis of the contribution of migrants to Innovation in the United States;
- engaging a country’s Diaspora for economic development (Invest Lithuania; Tunisia Diaspora Banking; a US AID project);

(presentation followed by a live demo)


Back to Mini-Workshop: Diasporas Lab Day