Titanic - Machine Learning from Disaster

Challange:

Build a predictive model that answers the question:"what sorts of people were more likely to survive?" using passenger data(ie name, age, gender, socio-economic class, etc).

Dataset Description:

The data has split into two groups:

training set(train.csv)

This .csv file contains passengers data(ie Passengerid, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked) along with the outcome(ie Survived)

test set(test.csv)

This .csv file contains passengers data(ie Passengerid, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked)

My Approach:

Programming Language: Python

Built two predictive models

With RandomForestClasssifier: Implemented a machine learning predictive model for titanic disaster to predict the passenger survival rate using RandomForestClasssifier
Kaggle Score: 0.77511

With XGBoostClasssifier: Implemented a machine learning predictive model for titanic disaster to predict the passenger survival rate using XGBoostClasssifier with tuned hyperparameters
Kaggle Score: 0.77990
Contribution:
1. Visualized the relationship between each feature with class
2. Identified 3 more useful features other than used features in RandomForestClasssifier
3. Converted string values of features('Sex', 'Embarked') to numeric values by using replace()
4. Tuned hyperparameters (n_estimators, learning_rate, max_depth, colsample_bytree) by using grid search for XGBoost hyperparameters.
  Credit : towardsdatascience

GitHub Code:

Click here