I doubt there’s a data analyst/scientist, Kaggler or dweller of Github out there who isn’t familiar with the near-mythical ‘Titanic dataset’ or, for that matter, its lesser known Estonian cousin (the ‘Estonia disaster dataset’).

Being self-employed, especially in a year that has been such a challenge, I wanted to do something that would help me stand out from the crowd. I decided that an online portfolio was the way to go but working in finance meant I couldn’t use any of my own work because banks really don’t like you sharing their data. Thus, began the search for publicly available and easy to access data that was suitable for the kinds of projects I’d worked on before (clustering, classification, time series, visualisation etc). Inevitably, I ended up on Kaggle and GitHub and found a few different datasets, but for this project the Titanic dataset seemed to tick all the boxes. …



Analyst, scientist and runner notprofessorgreen.com

