Skip to content

Recommend Expedia hotels with Amazon Personalize: the magic of Hierarchical RNNs

Reading Time: 14 minutes

Note: Jupyter notebook available here on Github. Other great resources are these AWS blog posts, here and here, and the official documentation here.

Context

Amazon Personalize is a Machine Learning AWS service enabling developers to easily build recommendation engines. It was introduced at re:Invent 2018, promising to be powered by the multi-decade experience Amazon has in delivering products’ recommendations at scale. It immediately caught my attention, and, eventually, almost exactly one year after launch, I decided to give it a go. As for all the other high-level ML services AWS offers, Personalize comes with a concise and effective API, almost completely hiding the deep technical side from the end-user. The Auto-ML functionality represents the most extreme level of abstraction, where the customer entirely relies on Amazon to figure out what to do with the data at hand. On the other side of the spectrum, Personalize offers more advanced users a fully customizable stack, in which all the ML knobs are visible and tunable. Given I am not a recommendations engines’ expert but I still like to somehow dive deeper into the technical details, I really liked both sides of the coin.

The Expedia dataset

As usual, the first thing I needed was a dataset. I decided to go for the one which came with the Expedia Hotels Recommendations competition on Kaggle. In this challenge, ML practitioners were asked to predict, for every user interaction on the website, five hotel groups visitors would end up booking. Expedia internally splits the catalog of accommodations based on common features of the various locations, hence drastically reducing the granularity of the problem from hotel-based to cluster-based recommendations, with a total of 100 clusters to choose from. The dataset goes as granular as search level, e.g. each row is a different hotel search performed by a different user looking for a specific destination with determined check-in and check-out dates. Each search might end up in a booking or not (`is_booking`: 1 if a booking, 0 if a click), and comes with a bunch of interesting attributes. Those can be grouped into three main categories:

  1. User-level attributes: these are features describing the user. Examples are:
    • site_name: ID of the Expedia point of sale (i.e. Expedia.com, Expedia.co.uk, Expedia.co.jp, …). This is technically an attribute of the search event. Still, it potentially defines the location of the user.
    • posa_continent: ID of continent associated with site_name
    • user_location_country: The ID of the country the customer is located
  2. Hotel-cluster-level attributes: these are features describing the hotel group. Examples are srch_destination_type_id (type of destination), hotel_continent and hotel_country. The idea for both User and Hotel level attributes is that they are not dependent on the search event, but that they can be treated as metadata to enrich a potential recommendation engine algorithm.
  3. Search-level attributes: these are features describing the specific transaction the user is performing on the website. Examples are:
    • date_time: timestamp of the search
    • is_package: 1 if the click/booking was generated as a part of a package (i.e. combined with a flight), 0 otherwise
    • srch_ci, srch_co: check-in and check-out date

A complete description of all the fields can be found here, on Kaggle.

Let’s see how we can feed this dataset to Personalize. In theory, what AWS needs is just user-item interactions, e.g. user, hotel and timestamp of the search event. This is easy to get given the structure of the Expedia’s dataset, as each row contains exactly what we are looking for (date_time, user_id and hotel_cluster). We just select the three requested columns and we are done. Wait, how does Personalize know the outcome of the search, e.g. whether it ended up in a booking or not? If we want it to learn the difference between an empty click and a reservation, we have to explicitly add EVENT_TYPE (string describing the type of event) and EVENT_VALUE (float assigning a value to the event, e.g. 1-5 stars in case of a product review). Those help Personalize figure out the different user-item interactions. Both columns are actually optional though, because, at the end of the day we could just focus on the searches which led to a booking. These are the interesting ones as they don’t highlight a mere intent but an actual action by the customer. In a movie or book recommender system, it would be equivalent to focusing only on highly rated items. This is what I did. I randomly sampled 3% of the dataset (~1.1M transactions) and filtered only the rows matching the condition is_booking=1 (8% of the total, 90k rows).

Once again, these 90k events over 3 columns would be sufficient to build a recommender engine. I wanted to spice up the recipe though and, given Personalize supports processing user-item metadata, I extracted from the dataset every static field I could, for both users and hotels. This operation consists of pulling all distinct USER_IDs and HOTEL_CLUSTERs and grabbing static attributes of each one of those. Especially for HOTEL_CLUSTERs, this is quite tricky as a single hotel group might be described by different metadata, according to the specific listing the user is visiting. To address this problem, a handy feature of Personalize is that it allows attributes populated by more than 1 value, to actually hold as many as needed by concatenating them via the pipe symbol (“|”). As per the “Formatting your input data” doc page, in case you have a movie being categorized as horror and comedy at the same time, you could encode film genre in the following way:

ITEM_ID,GENRE
item_123,horror|comedy

This is really useful, as in our case, once again for hotels, we are not dealing with unique locations, but with clusters. This means that metadata fields could be a collection of descriptors. For example, a user might book a stay in a hotel belonging to cluster 7, located in Morocco. Another user could instead reserve another hotel, within the same cluster, situated in Italy. Given that item_id is not associated with the hotel but with the cluster, we would be facing the dilemma of which country to assign to the cluster 7. Due to how Personalize handles these situations we can just concatenate the two and come up with country="Morocco|Italy". This is how the first 5 hotels’ groups look like when we pivot the original dataset onto the cluster dimension (head of the items.csv file).

ITEMS metadata (items.csv)

As for USERs (users.csv) and USER-ITEM interactions (inter.csv), here what their respective files resemble to:

USERS metadata (users.csv)
USER-ITEM interactions (inter.csv)

Amazon Personalize in action

Now, let’s go through the process of building a recommender system with Amazon Personalize. All the steps are available here in my Jupiter notebook, and they can be reproduced either via the python SDK (boto3) as in the notebook, or via the console.

Create CSV files with relevant data and save to S3

Generate 3 CSV files, one for USERS metadata (users.csv), one for ITEMS (hotels) metadata (items.csv) and one for USER-ITEMS interactions (inter.csv). In this process, it is important to keep in mind which schemas and data structures Personalize accepts, as they define the column names and types of the files (see Create and register schemas below). For instance, the interactions dataframe must contain at least 3 columns with the following names, USER_ID (string), ITEM_ID (string) and TIMESTAMP (long). The way to turn an actual timestamp object into a long (in python) is the following: `int(time.mktime(TIMESTAMP.timetuple()))`. It is important to cast to int at the very end. If we don’t do that, we end up with a trailing decimal zero, e.g. 1371038294.0 instead of 1371038294, and Personalize would throw an error. As for files with potentially long attributes (such as HOTEL_COUNTRY in items.csv), keep in mind that length cannot exceed 1000 characters. Make sure this is the case as otherwise Personalize complains. Once created, upload the files to an S3 bucket.

Create and register schemas to allow Personalize to properly read CSV files from S3

Here how to do it with boto3.

personalize = boto3.client('personalize')
schema_inter = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {"name": "USER_ID", "type": "string"},
        {"name": "ITEM_ID", "type": "string"},
        {"name": "TIMESTAMP", "type": "long"}
    ],
    "version": "1.0"
}
create_schema_inter = personalize.create_schema(name = "interact-schema", schema = json.dumps(schema_inter))
schema_users = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {"name": "USER_ID", "type": "string"},
        {"name": "SITE_NAME", "type": "string", "categorical": True},
        {"name": "POSA_CONTINENT", "type": "string", "categorical": True},
        {"name": "USER_LOCATION_COUNTRY", "type": "string", "categorical":True},
        {"name": "USER_LOCATION_REGION", "type": "string", "categorical": True}
    ],
    "version": "1.0"
}
create_schema_users = personalize.create_schema(name = "user-schema", schema = json.dumps(schema_users))
schema_items = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {"name": "ITEM_ID", "type": "string"},
        {"name": "SRCH_DESTINATION_TYPE_ID", "type": "string", "categorical": True},
        {"name": "HOTEL_CONTINENT", "type": "string", "categorical": True},
        {"name": "HOTEL_COUNTRY", "type": "string", "categorical":True}
    ],
    "version": "1.0"
}
create_schema_items = personalize.create_schema(name = "item-schema-nomarket", schema = json.dumps(schema_items))

Create a dataset group

Create a dataset group to host all the datasets relevant to a specific model (`create_dataset_group_response = personalize.create_dataset_group(name = “expedia”)`)

Create datasets

Create datasets within the `expedia` dataset group. Each dataset is defined by the dataset group it belongs to, its schema and its type (Interactions, Users or Items). Like this (example for the inter_ds, user-item interactions dataset):

inter_ds = personalize.create_dataset(name='interactions-ds',
                                      schemaArn='schemaARN',
                                      datasetGroupArn=dataset_group_arn,
                                      datasetType='Interactions')
The 3 datasets under the `expedia` dataset group in the AWS console

Edit bucket’s policy to allow Personalize access objects on S3

This is how to do it with boto3 and below is how it shows on the AWS S3 console.

s3 = boto3.client("s3")
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}
s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))
The bucket policy allowing Personalize to access objects on S3

Create an IAM role to give Personalize permissions to access S3

This is needed on top of the above-mentioned S3 bucket policy. The one I attach here is a more general role I had created to allow both SageMaker (SM) and Personalize access to S3, as I was running some experiments from within a SM notebook.

`expedia` IAM role: permissions
`expedia` IAM role: services (trust relationships) to which the above permissions are granted

Create import jobs to load data into Personalize

The next step consists in loading the data into Personalize. This is achieved via an import job, which reads a CSV file from S3 into a dataset. Here (as an example) how to import USER-ITEM interactions with boto3, and further below how the job shows up in the console:

inter_ij = personalize.create_dataset_import_job(
    jobName = 'interactions-ij',
    datasetArn = inter_ds['datasetArn'],
    dataSource = {'dataLocation': 's3://pochetti-personalize/inter.csv'},
    roleArn = 'arn:aws:iam::257446244580:role/expedia')
USER-ITEM interactions import job

Time to model: what options do we have?

Different modeling strategies (algorithms) are what Personalize calls recipes. This is how to list the available ones using boto3, and here the AWS doc page with a high-level description of what they do (in the last section we will delve deeper ourselves into what HRNN, Hierarchical Recurrent Neural Networks, are all about).

recipes = personalize.list_recipes()
for r in recipes['recipes']:
    print(f"Recipe Name: {r['name']}; Recipe Arn: {r['recipeArn']}")

Recipe Name: aws-hrnn; Recipe Arn: arn:aws:personalize:::recipe/aws-hrnn
Recipe Name: aws-hrnn-coldstart; Recipe Arn: arn:aws:personalize:::recipe/aws-hrnn-coldstart
Recipe Name: aws-hrnn-metadata; Recipe Arn: arn:aws:personalize:::recipe/aws-hrnn-metadata
Recipe Name: aws-personalized-ranking; Recipe Arn: arn:aws:personalize:::recipe/aws-personalized-ranking
Recipe Name: aws-popularity-count; Recipe Arn: arn:aws:personalize:::recipe/aws-popularity-count
Recipe Name: aws-sims; Recipe Arn: arn:aws:personalize:::recipe/aws-sims

Those algorithms require the user to have somewhat of an understanding of the science behind the task at hand. Especially because there are potentially lots of hyper-parameters to tune. If like me, you are not a recommendation engine wizard, you might want to consider the AutoML functionality Personalize offers. This consists in letting AWS take the stage, look at the data, choose the best among the 3 available HRNN recipes (aws-hrnn, aws-hrnn-coldstart, and aws-hrnn-metadata; the other algorithms are omitted) and perform hyper-parameter optimization. Let’s go for it.

AutoML recommender

As it should be, this is quite easy. It boils down to defining what Personalize calls a solution (setting `performAutoML = True`) and then creating a version, e.g. training the model. Below the python code to achieve that, and a couple of screenshots from the AWS console showing the results.

# define a solution
auto_recommender = personalize.create_solution(
    name = "expedia-recommender",
    datasetGroupArn = dataset_group_arn,
    performAutoML = True
)
solution_arn = auto_recommender['solutionArn']
# create a version
auto_recommender_model = personalize.create_solution_version(solutionArn = solution_arn)

As the Solution config shows in the picture from the console, below, Personalize optimized precision@25 (a section on metrics to follow) to find the best recipe, which in this case is `aws-hrnn`.

The AutoML Personalize functionality identified `aws-hrnn` as the best performing model, by optimizing for precision@25

As we asked, Personalize also trained the selected recipe, by creating a solution version. Actually, I am not sure why but, apparently, two versions were generated. I might have run the notebook cell twice by mistake.

2 identical versions were created for the same `expedia-recommender` solution

Anyway, opening them up, they turned out to be identical, both pointing to the same best recipe, with the same KPIs. Here how the `827f86d6` looks like. Side note: I am not sure why the console shows 19.964 training hours. It definitely took less than 3/4 of a day to run the training job!

One of the 2 identical versions of the same AutoML solution. The panel at the bottom shows the performance metrics associated with the version.

The same KPIs found in the console can also be pulled via the python SDK. Please note that we are not really expecting our solution to be good at all. At the end of the day, it is based on a random 3% hold-out set, and it cannot compete with the models supplied during the actual Expedia Kaggle competition.

response = personalize.get_solution_metrics(solutionVersionArn="arn:aws:personalize:eu-west-1:257446244580:solution/expedia-recommender/827f86d6")
response['metrics']

{'coverage': 0.7228,
 'mean_reciprocal_rank_at_25': 0.1281,
 'normalized_discounted_cumulative_gain_at_10': 0.1828,
 'normalized_discounted_cumulative_gain_at_25': 0.2457,
 'normalized_discounted_cumulative_gain_at_5': 0.1397,
 'precision_at_10': 0.0315,
 'precision_at_25': 0.0228,
 'precision_at_5': 0.0376}

As you might have noticed, in the previous AutoML solution, Personalize discarded the USER-ITEM metadata we had supplied as two distinct datasets in the `expedia` dataset group. Therefore, for the records, I created another solution forcing AWS to use the `aws-hrnn-metadata` recipe. Results were worse than the simpler, metadata-free solution. I did not perform any HPO though, so the comparison is not completely fair.

Getting hotel recommendations for a user: create and query a campaign

Once the model trained, the last step we are left with is to actually invoke it on a user and get hotel recommendations. This is achieved by

  1. creating what in the Personalize’s jargon is a campaign
  2. and then querying it to pull hotels!

Here the python magic to get point 1 done

>personalize_runtime = boto3.client('personalize-runtime')
create_campaign_response = personalize.create_campaign(name = "expedia-campaign",
                                                       solutionVersionArn = "arn:aws:personalize:eu-west-1:257446244580:solution/expedia-recommender/827f86d6",
                                                       minProvisionedTPS = 1)
campaign_arn = create_campaign_response['campaignArn']

and point 2 (for a random user_id `4539`)

personalize_runtime = boto3.client('personalize-runtime')
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = "4539")
recommended_hotels = get_recommendations_response['itemList']
for hotel in recommended_hotels:
    print(hotel)

{'itemId': '82'}
{'itemId': '36'}
{'itemId': '62'}
{'itemId': '81'}
{'itemId': '59'}
{'itemId': '91'}
{'itemId': '30'}
{'itemId': '85'}
{'itemId': '5'}
...truncated

Recommendation engines’ performance metrics

In the previous sections, I did not put any emphasis on metrics to measure the performance of recommendation engines. Personalize generates a short report for us for every version we train, and it is important to understand the numbers we look at. Here the AWS doc page to get started and below more detailed explanations of those KPIs (excel file with the below tables available on Github).

Coverage: 72.28% for our winning recipe. What does it mean? This metric represents the number of distinct items the recommender engine produces out of all possible user queries, divided by the total number of distinct items it was trained on. So, say our recipe was developed on 100 hotels. If we get recommendations for all possible users and the number of distinct hotels out of those is 80, then coverage would be 80%.

Mean Reciprocal Rank @ k (MRR@k, with k=25 in our case): 0.1281 for the winning recipe. This metric is used to score lists of ordered items against queries. The result of the query is considered the correct answer. In case of multiple results, other answers different than the first are ignored. The recommender engine comes up with a list of potentially interesting items, ordered by relevance up to element k, and we look at which position (rank) the correct answer appears in the list. Then we calculate the inverse of the rank. We do that for all queries and compute the average of inversed (reciprocal) ranks. Here a dummy example with k=5 for a hotel recommender.

Testing a dummy personalization engine on 4 users. Each user has its preferred hotel and for each user, the engine generates recommendations. The resulting Mean Reciprocal Rank @5 is 0.4875.

Normalized Discounted Cumulative Gain @ k (nDCG@k, with k=5, 10, 25 in our case): this metric generalizes over MRR@k as it does not stop at the first relevant item in the ranked list of recommendations. It actually parses the list up to elementk , it identifies all the relevant proposed items and assigns them a score equal to `1/log2(rank+1)`. It then sums it up to calculate DCG@k (1). The logic keeps going and repeats the previous steps after re-shuffling the order of the recommendations to put the most relevant at the top, e.g. as an ideal engine would do. The result of this step is the Ideal DCG@k (2). Dividing (2) by (1) gives the normalized DCG@k (nDCG@k). Average nDCG@k for all users and you are good to go. Following, an example with a dummy recommender.

Calculating nDCG@5 for a dummy recommender operating on one user

Precision @ k ( k=5, 10, 25 in our case): this one is easier to digest compared to the previous two. Precision answers the question: how many relevant items are there among the recommended ones? If the user loves 3 hotels, the engine spits out 10, but only 2 of the 3 are contained in the list, then precision@10 is 2/10=20%. What about precision@5? We focus on the top5 hotels out of the 10 recommended. Say 2 accommodations out of the 3 preferred by the user are contained within the 5, then precision@5 will be 2/5=40%. Average over all users and you are done.

What is an HRNN (Hierarchical RNN)?

As we saw above, Personalize’s AutoML functionality chooses the best recipe across three Hierarchical RNN (HRNN) approaches. But, what is an HRNN, actually?

The AWS wiki page refers to the “Personalizing session-based recommendations with hierarchical recurrent neural networks” paper and mentions

The Amazon Personalize […] (HRNN) recipe models changes in user behavior to provide recommendations during a session. A session is a set of user interactions within a given timeframe with a goal of finding a specific item to fill a need, for example.

https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-hrnn.html

I went through the paper and here what Quadrana et al. propose.

The problem

Most currently available USER-ITEM interactions datasets come with a time component, e.g. the timestamp of the event. Adding this dimension to the problem is useful, as it allows to capture the fact that user preferences tend to evolve. Specifically, a common practice in the personalization community is to split these kinds of interactions over defined time intervals, a.k.a. sessions. For instance, a customer starts navigating Expedia.com looking for accommodation. He keeps scrolling and clicking until he goes idle and comes back the day after. Either triggered by the same intent or not, the two sets of interactions are considered belonging to two distinct sessions. An obvious way of handling this type of sequential data is by employing RNN-based models. Indeed, those have recently shown quite successful results over more conventional recommender algorithms (matrix factorization, collaborative filtering, etc). The limitation of session-based RNN approaches, though, is that they act upon the session the user is currently engaging with, without leveraging any context from previous ones. The reason, illustrated in this 2015 paper by Hidasi et al, is that most customers navigate websites without being logged in, making it hard to build a real history of user-related consecutive sessions. Therefore, both at training and inference times, only individual timeframes are modeled.

What Quadrana et al claim is that in recent years we have witnessed a significant increase of both users being constantly logged in specific platforms (streaming services) and web tracking technologies (cookies or other identifiers). These changes allow addressing the main problem of single-session-based models, as longer user histories are easier to build.

The solution

Having said that, what the authors propose is to add an RNN layer on top of the intra-session approach, to keep track of inter-session evolution over time. This basically consists (see graphics below) in modeling a user-level representation, e.g. a vector of learnable weights c, which

  1. gets updated at the end of each session, and embeds `s`, the output of the session-level neural network.
  2. is used to initialize the RNN of the upcoming session (this is what authors call HRNN Init)
  3. (optional on top of #2) can be added or concatenated to the hidden states of the RNN of the upcoming session (this is what authors call HRNN All).

`c` is trained over time and manages to nicely propagate the natural evolution of users’ intents, embedding this information into the more local and less time-aware intra-session network.

That’s it. This is the principle behind a Hierarchical Neural Network. Quite powerful.

Graphical representation of the HRNN as per https://arxiv.org/pdf/1706.04148.pdf

Discover more from

Subscribe now to keep reading and get access to the full archive.

Continue reading