A term deposit is a major source of income for a bank. A term deposit is an investment held at a period financial institution. The money is invested for an agreed rate of interest over a fixed time.
The bank has various outreach plans to sell term deposits to their customers such as email marketing, advertisements, telephonic marketing, and digital marketing but identification of prospective customers for products is a huge problem in the banking sector.
Marketing products to the customers requires huge investment as large marketing firms are hired to carry out the product/service marketing at scale. Hence it is crucial to identify the customers that are most likely to convert beforehand in order to specifically target them.
This would be beneficial both for the bank and the customers since it would result in reduced marketing cost, also only interested customers would receive the marketing communication.
Problem Outline
The bank must identify an approach to identify the most prospective customers for the term deposit so that they can be specifically targeted as blanket targeting would be ineffective and cost heavy.
Solution Approach
-
Gather customer level data that has all available details on existing customers:
-
Demographics Data
-
Transactional Data
-
Loan Details
-
Communication and engagement details
-
Banking Data
-
Flag the customers that are already term deposit subscribers
-
Using the above-mentioned data train an XG-Boost Model to predict whether a customer would subscribe to a term deposit or not
-
Validate and tune the model to give out more accurate predictions
-
Predict the term deposit flag for new customers
-
Send the list of customers that are more likely to subscribe to the marketing agency
Implementation
We will be using the following dataset to illustrate how can this problem be solved:
Banking Dataset - Marketing Targets | Kaggle
Feature Description:
-
age - age
-
job : type of job
-
marital : marital status
-
education
-
default: has credit in default? (binary: "yes","no")
-
balance: average yearly balance, in euros (numeric)
-
housing: has a housing loan? (binary: "yes","no")
-
loan: has a personal loan? (binary: "yes","no")
-
contact: contact communication type
-
day: last contact day of the month
-
month: last contact month of the year
-
duration: last contact duration, in seconds
-
campaign: number of contacts performed during this campaign and for this client
-
pdays: number of days that passed by after the client was last contacted from a previous campaign
-
previous: number of contacts performed before this campaign and for this client
-
poutcome: outcome of the previous marketing campaign
-
y - has the client subscribed to a term deposit? (Output variable| binary: "yes","no")
EDA:
Let us first have a look at how the target variable is distributed:
Let us also try to understand people of which age group are more likely to subscribe to a term deposit:
Model Fitting with Hyperparameter tuning:
Splitting the data into training and testing:
Defining the space across which the hyperparameters would be tuned:
Defining function to iteratively fit models across the hyperparameter space:
Printing out the best parameters:
Output:
Final Model Evaluation:
We finally fit the model with the best parameters, as mentioned below:
Output:
Conclusion
We have now trained a 90% accurate XGB model that can predict whether a customer would be inclined to subscribe to a term deposit or not.
The model can now be used to identify potential term deposit subscribers to carry out targeted marketing activities resulting in reduced marketing costs.
Should XG-Boost be used all the time for classification problems related to the banking domain?
Based on the no-free-lunch theorem, “There exists no single best optimization algorithm”. So every problem should be approached with a fresh mindset and all possible algorithms available should initially be tested out and, finally, the best performing algorithm should be selected and further tuned to obtain the best results.
When not to use XG-Boost?
It has been observed that XG-Boost algorithm tends to underperform in the following cases and hence should be avoided while dealing with problems related to the following:
-
Computer vision
-
Natural language processing
-
Tasks involving extrapolation
What problems related to the banking industry can be resolved using the XG-Boost algorithm?
Since its introduction, the XG-Boost algorithm has not only been credited with winning numerous modeling competitions but also for being the driving force for several cutting-edge industrial applications. The XG-Boost algorithm can be leveraged to solve the following problems that are related to the banking industry:
-
Potential Customer Prediction
-
Prediction of potential customers to target specific services via marketing
-
Credit Default Prediction
-
Predicting which customers would default their upcoming debt repayment
-
Customer Churn Prediction
-
Predicting which customers are more likely to churn out of a particular service, such customers can be flagged and targeted with schemes in order to retain them
-
Fraud and Anomaly Detection
-
Frauds and anomalies related to financial transactions can be detected using XG-Boost as a classifier