How bias in Recommender Systems affects e-commerce, society and eventually your profits
Almost 70% of internet users in the EU, have bought or ordered goods or services for private use [source: Eurostat].
In the US, e-commerce sales were estimated to be approximately 870billion dollars [source: Digital Commerce 360].
Undoubtedly the COVID-19 pandemic has fuelled a significant increase in internet usage both in e-commerce sales and in content-provider systems. However, the key factor behind this increase is the utilization of Recommender Systems (RecSys) by an exponentially growing number of services.
Recommender (or recommendation) systems* provide recommendations to users, based on user behavior data, either explicitly or implicitly. Ideally, RecSys offer continuous user engagement with content, as well as increasing loyalty and satisfaction of users. But what are the major contributing factors behind these?
RecSys can provide personalized content. A recent study in the US showed that 71% of customers expect companies to deliver personalized interactions and 76% of them get frustrated when they don’t get it.
RecSys can offer a vast amount of similar items to those recently purchased, viewed or rated by the users and as a result they spend more time interacting with the products.
What is more, these systems can have a serious impact on sales promotions (e.g. what types of products users can select and consume), as sellers can persuade customers to buy specific items. All the aforementioned benefits of RecSys can lead to increased consumption and thus, higher profits for businesses.
Bias in RecSys
Despite the above mentioned benefits of RecSys, there are a number of performance and ethical issues that need to be addressed and investigated. Apart from well known performance issues in RecSys such as cold-start and data sparsity, little attention has been paid to ethical issues, until recently. In this article, we will focus on how the ethical issues related to recommender systems and their impact merely on us (either as consumers or citizens and users) can also significantly impact e-commerce platforms.
One of the most important types of bias that arise in RecSys is popularity bias. This type of bias describes the phenomenon of popular items (“head”) being recommended frequently while less popular (“long-tail”), niche products, are recommended rarely or not at all. The 80/20 ratio is based on the Pareto principle: for many outcomes, roughly 80% of consequences come from 20% of cause.
The effects of popularity bias can also have a significant (negative) business impact especially on e-commerce systems. Apart from popularity bias, there are some other aspects of RecSys quality that need to be addressed, which are presented in Image 4.
Study & Findings
In order to investigate the business impact of popularity bias and of the other aspects of RecSys quality (Diversity, Coverage and Novelty) and to gain a better understanding of their sources, an extensive study was held by our team, using a real dataset provided by a major electronics retailer. The dataset contained 8.263 ratings, 3.078 products and 276 users.
The first step of the process was to build 11 different RecSys models utilizing 11 different algorithms, from classical to the most recent, State-Of-The-Art, approaches (see Image 5 for more details). Each of these systems produced a list of 10 items for every user. Then, we had to evaluate the results produced by the respective RecSys. For this purpose 12 different metrics were selected.
The main findings of our study are as follows:
🔎 From Average Recommendation Popularity (ARP) (Image 6) it is apparent that most of the algorithms recommended the most popular items. More specifically, despite the fact that the average number of ratings per item in the dataset is 2.68, all the algorithms’ scores, except ItemKNN, NGCF and Random are much higher than this value.
🔎 The results of this study indicate that RecSys developers should be aware of the bias-accuracy trade-off and should avoid using algorithms that enlarge this phenomenon (of bias), as shown in Image 7. It has to be mentioned that the poor results in accuracy are caused by the:
very high percentage of dataset sparsity (almost 99%)
uneven distribution of ratings
relatively small number of ratings
🔎 Algorithms may not cover all the items given as input, this can be clearly seen in Image 8. Consequently, a significant amount of items will remain unseen for the majority of users, affecting the number of sales and decreasing user satisfaction. Moreover, it was found that diversity is not always connected with popularity bias and except our baseline algorithm, itemKNN and NGCF all the other algorithms are not enhancing diversity in recommended items
In terms of accuracy, users should be highly encouraged to rate products. RecSys need vast amounts of data to produce meaningful results. But in most cases, there are a number of customers that haven’t offered explicit feedback to the system. Cookies might be one of the most effective ways to overcome this problem by collecting user data. However this bears additional overhead for managing risks related to privacy and GDPR compliance.
As regards popularity bias and other aspects of RecSys quality, the proposed solutions depend on whether the e-commerce system owner has access to the model or not. If they have access to the model then a bias mitigation technique (pre-processing, in-processing, post-processing or a combination of these) can be used. Otherwise, some other mediocre solutions can be applied such as encouraging users to rate recently purchased products.
Last but not least, bias can affect item providers (either sellers or manufacturers). The rationale behind this is that if the product belongs to the long tail category or if it is new, it may never be recommended to users.
In conclusion, e-commerce businesses need to control popularity bias, diversity and novelty because they highly affect user satisfaction and their profits as well.
* Note: in this article we refer to collaborative filtering systems and more specifically top-n recommenders that produce a list of top-n recommendations, as an output, for each user.