XGBoost Classifier

1 minute read

Using XGBoost in predicting staff promotion algorithm.

XGBoost is one of the most popular machine learning algorithm these days. Regardless of the type of prediction task at hand; regression or classification.

XGBoost is well-known to provide better solutions than other machine learning algorithms. In fact, since its inception, it has become the “state-of-the-art” machine learning algorithm to deal with structured data.

But what makes XGBoost so popular?

  • Speed and performance: Originally written in C++, it is comparatively faster than other ensemble classifiers.

  • Core algorithm is parallelisable: Because the core XGBoost algorithm is parallelisable it can harness the power of multi-core computers. It is also parallelizable onto GPU’s and across networks of computers making it feasible to train on very large datasets as well.

  • Consistently outperforms other algorithm methods: It has shown better performance on a variety of machine learning benchmark datasets.

  • Wide variety of tuning parameters: XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, missing values, tree parameters, scikit-learn compatible API etc.

# Using XGBoost in Python First of all, just like what you do with any other dataset, you are going to import the dataset and store it in a variable called “Main_Data”. To import we use the Pandas python package. We import other libraries as we did for the EDAsExploratory Data Analysis

  1. Importing libraries:
     from sklearn.model_selection import train_test_split, cross_val_score
     from imblearn.over_sampling import SMOTE
     from xgboost import XGBClassifier
     from sklearn.preprocessing import StandardScaler
     from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
     from sklearn.metrics import mean_squared_error, f1_score, precision_score, recall_score