Titanic - Machine Learning from Disaster
Challange:
Build a predictive model that answers the question:"what sorts of people were more likely to survive?"
using passenger data(ie name, age, gender, socio-economic class, etc).
Dataset Description:
The data has split into two groups:
- training set(train.csv)
- This .csv file contains passengers data(ie Passengerid, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked) along with the outcome(ie Survived)
- test set(test.csv)
- This .csv file contains passengers data(ie Passengerid, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked)
My Approach:
Programming Language: Python
Built two predictive models
-
With RandomForestClasssifier:
Implemented a machine learning predictive model for titanic disaster to predict the passenger survival rate using RandomForestClasssifier
Kaggle Score: 0.77511
-
With XGBoostClasssifier:
Implemented a machine learning predictive model for titanic disaster to predict the passenger survival rate using XGBoostClasssifier with tuned hyperparameters
Kaggle Score: 0.77990
-
Contribution:
-
- Visualized the relationship between each feature with class
- Identified 3 more useful features other than used features in RandomForestClasssifier
- Converted string values of features('Sex', 'Embarked') to numeric values by using replace()
- Tuned hyperparameters (n_estimators, learning_rate, max_depth, colsample_bytree) by using grid search for XGBoost hyperparameters.
Credit : towardsdatascience
GitHub Code:
Click here