Project Proposal by Martin Stacey


Predicting popularity of baby names

Software

SAS OR other data analytics tools OR R OR Python

Covers

Data analytics

Skills Required

Data analytics, interest in parenthood, interest in social trends

Challenge

Conceptual ??? Technical ???? Programming ???

Brief Description

For a prospective parent, choosing a name for a new child is one of the big challenges. One criterion to be considered is how common or rare or fashionable a name is. Some countries at least have published information about the popularity of baby names. When Martin Stacey was last an expectant father, he had hours of fun with his favourite baby names website: the United States Social Security Administration's Popular Baby Names website. But parents are just as interested in future popularity of names as past popularity. Is the name you're thinking of about to become excessively trendy?

The data set for US given names since 1879 is available from the US Social Security Administration, along with a range of other interesting data sets, at https://www.ssa.gov/open/data/ So you can apply data analytics techniques to trying to answer this question.

The challenge of this project is to develop methods for predicting the future popularity of names in the United States, and testing how well they work. Can we predict the popularity of a name in the USA? This can be tested by running the procedure on the data up to a particular year, and seeing how well the predictions match what really happened. Can names be clustered into groups of names whose popularity rises and falls in the same way? (If so, do these names have cultural or demographic features in common?) Can patterns be identified in how names rise and fall? What does the future hold?


Back to