Case Study

Real time Ad bidding using Artificial Intelligence

We worked with a leading ad exchange company in the UK and developed a state-of-the-art Real-time bidder using AI. The system was built using reinforcement learning and outperformed traditional bidding models used in the industry achieving a performance gain of 10.5% against traditional methods. 

  • Achieved 10.5% performance gain against state-of-the-art method in real world dataset
  • Solved challenges related to model convergence in RL setting
  • Developed cutting edge algorithms for model training and evaluation


Bid decision in real time bidding system is considered as a static optimisation problem. Static optimisation methods either treat the value of each impression independently or set a bid price to each segment of ad volume. These static optimisation methods are rule based and cannot adapt to changing environments. The research on optimal bidding strategies has been focused largely on statistical solutions, making a strong assumption that the market data is stationary i.e. their probability distribution does not change over time in response to the current bidder’s behaviours. This results in sub-optimal performance and the algorithms are not able to adapt to changing environment which results in lower conversions and revenue for advertisers

In ad auctions, ad campaign bidders not only interact with the auction environment but with each other as well. The changes in the strategy of one bidder affect the strategies of other bidders and vice versa. In addition, existing computational bidding methods are mainly concerned with micro-level optimisation of one party (a specific advertiser or merchant)’s benefit. But given the competition in the RTB auction, optimising one party’s benefit may ignore and hurt other parties’ benefits. From the ad system’s viewpoint, the micro-level optimisation may not fully utilise the dynamics of the ad ecosystem in order to achieve better social optimality.

In order to deal with these problem a leading ad exchange company in UK approached us and wanted to explore the potential of building an AI system for real time ad bidding. We started collaborating with them and the key requirement of the project were as follows 

  • Develop a high performing model that can outperform traditional statistical models 
  • Process large volumes of data in real-time 

The Solution

Aegasis Labs helped in transforming this need into a reality by designing and  developing a PoC real-time ad bidding system using AI. Our team kickstarted the engagement by conducting exploratory data analysis on the available data and on the available research for building an AI system to tackle this problem.

In the design phase, the models were constructed to devise an optimal bidding strategy so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. For this purpose a list of latest reinforcement learning algorithms were surveyed such as model based, model-free (such as Q-learning) etc. 

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an algorithm employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward. Although the designer sets the reward policy–that is, the rules of the game–he gives the model no hints or suggestions for how to solve the game. It’s up to the model to figure out how to perform the task to maximise the reward, starting from totally random trials and finishing with sophisticated tactics and superhuman skills. 

After the evaluation of research, we formulated the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign’s real-time parameters, while an action is the bid price to set. By modelling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimise the advertising performance in the dynamic real-time bidding environment. 

Technical Bit

Mathematically, we consider bidding in display advertising as an episodic process, each episode comprises T auctions which are sequentially sent to the bidding agent. Each auction is represented by a high dimensional feature vector x, which is indexed via one-hot binary encoding. The fields consist of the campaign’s ad information and the auctioned impression contextual information (e.g., user cookie ID, location, time, publisher domain and URL). 

At the beginning, the agent is initialised with a budget B, and the advertising target is set to acquire as many clicks as possible during the following T auctions. Three main pieces of information are considered by the agent (i) the remaining auction number (ii) the unspent budget  and (iii) the feature vector x. During the episode, each auction will be sent to the agent sequentially and for each of them the agent needs to decide the bid price according to the current information. The agent maintains the remaining number of auctions t and the remaining budget b. At each time-step, the agent receives an auction and determines its bid price. When an agent wins an auction it can observe the user response and the market price later. Alternatively, if losing, the agent gets nothing from the auction. We take predicted Click-through-rate as the expected reward, to model the action utility. 


We compared our AI solution against Linear Bidding Strategy which is the most widely used model in the industry. We also tested our AI solution on an open source public dataset which is a multi device display advertising data and has 441.7M impressions, 416.9K clicks over a period of 8 days. 

Our AI Solution achieved 10.5% and 9.6% performance gains against the industry standard method on client’s dataset and public dataset. 

The main goal of the bidding agent is to optimise the campaign’s KPI (e.g., clicks, conversions, revenue, etc.) given the campaign budget. We consider the number of acquired clicks as the KPI, which is set as the primary evaluation measure in our solution. 

Interested in finding out how we can help your products/services faster and better with AI?

Schedule a call with one of our team members today to find out more.

Get more out of your data. Start your cloud data and AI journey today.

Sign up to our newsletter