XGBoost Classifier
Using XGBoost in predicting staff promotion algorithm.
XGBoost is one of the most popular machine learning algorithm these days. Regardless of the type of prediction task at hand; regression or classification.
XGBoost is well-known to provide better solutions than other machine learning algorithms. In fact, since its inception, it has become the “state-of-the-art” machine learning algorithm to deal with structured data.
But what makes XGBoost so popular?
-
Speed and performance: Originally written in C++, it is comparatively faster than other ensemble classifiers.
-
Core algorithm is parallelisable: Because the core XGBoost algorithm is parallelisable it can harness the power of multi-core computers. It is also parallelizable onto GPU’s and across networks of computers making it feasible to train on very large datasets as well.
-
Consistently outperforms other algorithm methods: It has shown better performance on a variety of machine learning benchmark datasets.
-
Wide variety of tuning parameters: XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, missing values, tree parameters, scikit-learn compatible API etc.
# Using XGBoost in Python First of all, just like what you do with any other dataset, you are going to import the dataset and store it in a variable called “Main_Data”. To import we use the Pandas python package. We import other libraries as we did for the EDAsExploratory Data Analysis
- Importing libraries:
from sklearn.model_selection import train_test_split, cross_val_score from imblearn.over_sampling import SMOTE from xgboost import XGBClassifier from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from sklearn.metrics import mean_squared_error, f1_score, precision_score, recall_score