Analyzing Airbnb New User Bookings

Andrea Cabello

3 min readApr 8, 2021

What will a new airbnb user’s first booking destination be?

I. Overview

For my capstone project at Flatiron School I analyzed data provided by Airbnb at kaggle.com

All the users in the data set are from the USA.
The data was provided in the form of multiple data sets by Airbnb itself as a challenge on Kaggle.
I will grab the train_data set and perform my own train_test_split.

This project consists of two parts.

Part I: Binary Classification Model

Will a new airbnb user end up booking a destination? True or False. For this, we will create a new feature called ‘effective_booking’

Part II: Multi-Class Classification

What will a new airbnb user’s first booking destination be? There are 12 possible outcomes of the destination country: ‘US’, ‘FR’, ‘CA’, ‘GB’, ‘ES’, ‘IT’, ‘PT’, ‘NL’,’DE’, ‘AU’, ‘NDF’ (no destination found), and ‘other’. Please note that ‘NDF’ is different from ‘other’ because ‘other’ means there was a booking, but is to a country not included in the list, while ‘NDF’ means there wasn’t a booking.

II. Business Problem

Predict whether a new airbnb user will effectively book a destination or not.
Predict which country a new airbnb user’s first booking destination will be.

We created a feature “effective_booking” True or False and build a binary classification model to predict if a customer will end up booking or not.
What is happening? what defines if a customer ends up booking or not at a granular or overall level?
Only 42% of users ended up booking.
Then build a classifier to predict of those who book, where are they going?
128070 observations (users) in the train data.
74878 NDF (no destination found) 58%
Number of actual bookings: 53192
US represents domestic travel, which is 70% of all bookings in our data set.

III. Feature Engineering

Age Feature: we used the .cut() method to create age bins and assign the users ages to the corresponding one.

train_data['age_bins'] = pd.cut(x=train_data['age'], bins=[14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 
                                                           79, 84, 89, 94, 99])train_data['age_bins'] = train_data.age_bins.astype(str)age_mapper = {'nan':'unknown',
'(29.0, 34.0]':'30-34', 
'(24.0, 29.0]':'25-29', 
'(34.0, 39.0]':'35-39', 
'(39.0, 44.0]':'40-44', 
'(19.0, 24.0]':'20-24', 
'(44.0, 49.0]':'45-49', 
'(49.0, 54.0]':'50-54', 
'(54.0, 59.0]':'55-59', 
'(59.0, 64.0]':'59-64', 
'(64.0, 69.0]':'65-69', 
'(14.0, 19.0]':'15-19', 
'(69.0, 74.0]':'69-74', 
'(74.0, 79.0]':'75+', 
'(79.0, 84.0]':'75+', 
'(94.0, 99.0]':'75+', 
'(84.0, 89.0]':'75+', 
'(89.0, 94.0]':'75+',}train_data['age_bins'].replace(age_mapper, inplace=True)

Effective_booking feature

countries_list = train_data['country_destination'].unique().tolist()
countries_list.remove('NDF')train_data['effective_booking'] = train_data['country_destination'].isin(countries_list)train_data.effective_booking.value_counts()
False    124543
True      88908

IV. EDA

V. Model Results

Binary Classification Model: Random Forest Classifier

Training Accuracy for Random Forest: 64.72%
Test Accuracy for Random Forest: 64.89%

Multi Class Classification: XGBoost Classifier

Training Accuracy: 87.56%
Validation accuracy: 87.59%

VI. Conclusions and Future Work

As the dataset contained new users information, the value ‘unknown’ appeared often in several categories. Considering this, the fact that we were able to predict destinations with high accuracy is surprisingly good.
Our binary classification model could be improved but it is still quite helpful to somewhat understand what are the common traits among first time users that will end up booking a destination vs those who won’t.
With our XGBoost classifier model, we can correctly predict which destination a new user will choose. This is very valuable information for marketing purposes.
High accuracy despite high number of unknown values.
Predict users behavior allows us to implement Target marketing strategies.
Build a model to help us define our Market Segmentation strategy.