A team of researchers at Alibaba Group has published a paper describing Bid2X, a foundation model for auto-bidding in online advertising that treats the problem of predicting bid outcomes as a single unified function, rather than building separate models for each bidding scenario. The model was deployed on Taobao and raised gross merchandise volume by 4.65% and return on investment by 2.44% in a two-month online A/B test against the platform's existing model-based reinforcement learning system.

The paper was first posted on arXiv on October 27, 2025, under the identifier arXiv:2510.23410, and was revised on March 14, 2026. It is accepted for the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, scheduled for August 3 to 7, 2025, in Toronto. The authors are Jiahao Ji, Tianyu Wang, Yeshu Li, Yusen Huo, Zhilin Zhang, Chuan Yu, Jian Xu, and Bo Zheng, all affiliated with Alibaba Group.

The problem with scenario-specific models

Auto-bidding systems sit at the core of modern programmatic advertising platforms. For any given ad campaign, the system must estimate - in near real time - what a particular bid will cost, how much revenue it will generate, and how many impressions it will win. The standard approach has been to build a separate model for each bidding strategy or scenario: budget-constrained bidding, target return on ad spend, keyword-based campaigns, crowd-targeting, and so on. Each model learns the relationship between bid inputs and their outcomes only within the data it was trained on.

According to the paper, this creates two persistent weaknesses. First, a model trained on one scenario cannot transfer what it learns to another, so a new bidding product requires a new model trained from scratch. Second, the models used in practice - linear programming, PID control, and reinforcement learning - each capture only a partial view of the bidding environment. Linear programming estimates the maximum value of impressions obtainable for a given budget, but assumes the historical environment is still accurate. PID methods adjust bids based on real-time feedback but focus on a single control variable. Reinforcement learning can handle sequential bidding decisions but typically treats environment modeling as implicit rather than explicit.

Bid2X is designed to replace all of these with a single model trained on data from every scenario at once. The authors define it as "a large deep learning model trained on broad bidding environment data such that it can be applied across a wide range of bidding scenarios." The central observation is that certain principles hold across all scenarios regardless of the specific bidding strategy in use: more cost-effective impressions produce better ad performance, the relationship between bid and cost follows a law of diminishing marginal returns, and bidding patterns repeat with both daily and intraday periodicity.

Architecture: three technical challenges

The paper identifies three design problems that any such foundation model must solve.

Heterogeneous bidding data is the first. Data collected from different scenarios combines time series (bids, costs, and rewards recorded at each time slot), discrete categorical variables (advertiser category, product type), and continuous numeric constraints (budget size, delivery duration). These types have different statistical properties and, until now, no unified method for encoding them together into a common representation.

Complex and dynamic data dependency is the second. The bidding environment is a multi-agent competitive game. The relationship between a given bid and its resulting cost or reward changes throughout the day - the same bid placed in the evening may win more impressions than the identical bid placed during working hours, because shopping behavior varies. Existing models learn dependencies between variables but largely ignore how those dependencies shift over time.

Zero-inflated data distribution is the third. Placing a bid does not guarantee winning an impression. Many bid records result in zero cost, zero reward, and zero impression count. This means the data is dominated by zeros and does not follow a normal distribution. Standard neural network objectives assume normality and perform poorly when the real distribution is heavily zero-inflated.

How Bid2X is built

The model architecture has three main components.

The unified data embedding module converts heterogeneous data into a common series format. Historical bidding trajectories - sequences of records from the previous day - are encoded variable by variable, with separate embeddings for cost, reward, impression count, bid, and tick time. Discrete constraints such as advertiser category are looked up in a learnable embedding table. Continuous numeric constraints such as budget are passed through linear transformations. All of these are summed into a single context embedding and added to the variable series embeddings through a broadcasting addition. Today's incomplete trajectory is handled differently: all values at a given time slot are stacked into a single vector and encoded as a token, with positional encoding added to preserve time order.

The bidding transformer applies two different attention mechanisms to the two types of encoded data. A variable attention encoder processes the historical embeddings, treating each variable (bid, cost, reward, count) as a token and learning pairwise correlations between them. This encoder uses N1 stacked variable-based attention blocks. A temporal attention decoder processes today's embeddings, treating each time slot as a token and attending only to past time slots via a causal mask that prevents future information from influencing current predictions. This decoder uses N2 stacked blocks. The outputs of the encoder and decoder are fused by a variable-aware fusion module, which applies a sigmoid gate to select which parts of the temporal representation are relevant for forecasting each target variable. The final configuration uses 2 layers for both encoder and decoder, with a hidden dimension of 1024.

The zero-inflated projection module addresses the data distribution problem. Rather than predicting bid outcomes directly, the model first estimates the probability that any given outcome is non-zero, using a binary classifier. It then multiplies that probability by a separate magnitude prediction, combining the two into a final predicted value. The training objective is a joint combination of cross-entropy loss for the classification component and mean squared error for the regression component. The paper includes a formal proof - based on KKT optimality conditions - that this objective converges to the true zero-inflated distribution under mild assumptions.

A secondary self-supervised task, cumulative prediction, adds a global view: the model also predicts the total accumulated cost, reward, and count from the current time slot to the end of the campaign. This provides a broader signal that complements the next-step prediction.

Experiments across eight datasets

The offline evaluation used eight datasets drawn from the advertising platform of Taobao. Together, they contain 132,734,218 bidding trajectories and 3,070,868 bidding records. The datasets cover multiple scenario types: budget-constrained bidding (47.9 million records, 1.1 million trajectories), target-ROAS bidding (7.6 million records), small-budget advertisers, medium-budget advertisers, large-budget advertisers (136,520 records), and three delivery-duration segments covering campaigns running 6 to 12 hours, 12 to 18 hours, and 18 to 24 hours.

The baselines span four categories. Landscape forecasting methods include DLF and ADM, which model the probability distribution of market prices for each auction. The heuristic method is a linear programming approach based on historical bidding records. Variable-based approaches include MMP (a min-max predictor), MLP, and iTransformer, which takes each variable series as a token. Temporal-based approaches include the non-stationary transformer and Informer. A foundation-model version of Informer, trained on all datasets jointly rather than on each separately, was also evaluated to isolate the effect of the joint training paradigm from the effect of the architecture.

According to the paper, Bid2X outperforms all nine baselines across every target variable - cost, reward, and impression count - on all eight datasets. The improvement is consistent in both mean absolute error and root mean squared error. The Informer foundation model variant performs worse than some dedicated baselines on several datasets, which the authors interpret as evidence that temporal modeling alone is insufficient when bidding data also has strong cross-variable dependencies.

An ablation study isolates the contribution of each component. Removing variable attention (w/o va) increases cost MAE on the budget-constrained bidding dataset from 122.50 to 150.28, a rise of roughly 23%. Removing temporal attention (w/o ta) raises it further to 159.00. Removing the zero-inflated projection (w/o zip) increases reward MAE substantially: from 5.94 to 8.93 on the same dataset. Removing cumulative prediction (w/o cfp) produces smaller but consistent degradation across all variables. The effect of removing components is amplified on the large-budget dataset, which the authors attribute to that dataset having more complex inter-variable relationships.

Scalability and physical laws

The paper tests whether Bid2X shows scaling behavior. Performance improves predictably as dataset size D increases, following a power-law relationship: model loss L scales as L = (D / 3.81 x 10^38)^(-0.069). Model size N follows a similar relationship: L = (N / 3.87 x 10^75)^(-0.031). These relationships span more than four orders of magnitude in both dimensions, which the authors describe as evidence of foundation model behavior. Larger models also converge faster during training, reaching equivalent loss levels with fewer processed tokens than smaller models.

The model is also evaluated against two physical laws that should hold in any bidding system. Monotonicity - the requirement that tick cost increases monotonically as the bid increases - holds for 82 to 85 percent of samples across all cost levels in Bid2X, compared to 22 to 26 percent for the Informer foundation model. Predictability - the requirement that prediction error decreases as more historical bidding data is available for the current day - is confirmed in the results: as the relative position of the current token in the bidding trajectory increases from 1/9 to 9/9, cost MAE drops from roughly 200 to near 50.

Zero-shot and few-shot generalization

The generalization test uses a crowd-type bidding dataset not included in training. In zero-shot evaluation - direct inference with no additional training - Bid2X achieves a cost MAE that falls below the few-shot (1% training data) performance of every competing baseline including the Informer foundation model. With 1% of training data, Bid2X reaches a cost MAE of 45.63, against 54.34 for the Informer foundation model and 59.66 for MLP. At 5% training data, Bid2X falls to 43.50. The paper attributes this generalization to the transfer of learned variable relationships across bidding scenarios.

Production results on Taobao

The production deployment covered approximately one million ad campaigns. The model was adapted to use its cost prediction to find the bid that would bring total spending as close to the remaining budget as possible at each time step. The comparison system was model-based reinforcement learning, described in the paper as one of the top-performing bidding agents currently operating on the platform.

According to the paper's Table 4, which covers two months of online A/B testing, Bid2X produced improvements across all measured metrics relative to model-based RL: page views increased by 7.57%, budget consumption by 2.16%, the number of impressions won by 3.89%, gross merchandise volume by 4.65%, and return on investment by 2.44%. Inference latency increased by 0.05 seconds, which the authors characterize as slight given the scale and complexity of the model.

Context for programmatic advertising

The shift from scenario-specific models to a single foundation model trained jointly on diverse bidding data is consistent with the broader direction of AI investment in the programmatic ecosystem. NVIDIA and its advertising infrastructure partners demonstrated at Cannes Lions 2026 that GPU-accelerated AI models can be made fast enough to operate inside live auction windows measured in milliseconds, a prerequisite for any foundation model approach to auto-bidding in production. Google's ongoing push to consolidate bidding controls away from manual inputs and toward machine-learned automation mirrors the architectural logic in Bid2X: fewer, broader models that generalize across contexts rather than narrow models tuned to individual campaign types.

The IAB Tech Lab's Agentic RTB Framework, released in November 2025, addressed a parallel challenge - standardizing how AI agents can be deployed inside programmatic pipelines - and reflects the same pressure toward AI infrastructure that can operate across heterogeneous auction environments. The Bid2X work approaches this from the model side: one foundation model, trained jointly on data from all scenarios, that can be adapted to any bidding context without retraining from scratch.

The paper does not address how Bid2X would perform on advertising environments outside Alibaba's own platforms, where data from multiple advertisers competing in open auctions is less directly available for training.

Timeline

  • October 27, 2025 - Bid2X paper first posted on arXiv (arXiv:2510.23410 v1) by Jiahao Ji and co-authors at Alibaba Group
  • March 14, 2026 - Revised version (v2) posted on arXiv
  • August 3-7, 2025 - Paper accepted at the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto (conference date)
  • Two months of online A/B testing - Bid2X deployed on Taobao against model-based RL baseline, covering approximately one million ad campaigns; results: +4.65% GMV, +2.44% ROI, +7.57% page views, +3.89% impressions won, +0.05 seconds latency

Summary

Who: Researchers at Alibaba Group - Jiahao Ji, Tianyu Wang, Yeshu Li, Yusen Huo, Zhilin Zhang, Chuan Yu, Jian Xu, and Bo Zheng.

What: A paper introducing Bid2X, a foundation model for bidding environment modeling in online advertising. The model estimates bid outcomes - cost, reward, and impression count - across all advertising scenarios using a single unified architecture, combining a variable attention encoder, a temporal attention decoder, a variable-aware fusion module, and a zero-inflated projection module.

When: First posted on arXiv on October 27, 2025; revised March 14, 2026; accepted for the ACM SIGKDD 2025 conference (August 3-7, 2025, Toronto). The production A/B test ran for two months on Taobao.

Where: Developed at Alibaba Group and deployed on Taobao, one of the world's largest e-commerce platforms. Offline evaluation used eight datasets from the Taobao advertising platform containing 132.7 million bidding trajectories.

Why: Existing auto-bidding models are built and trained for specific bidding scenarios and cannot generalize across different campaign types. Bid2X introduces a single one-for-all model trained jointly on diverse bidding data, enabling it to handle new scenarios - including zero-shot - while outperforming nine dedicated baselines across all eight datasets, and raising gross merchandise volume by 4.65% and return on investment by 2.44% in live production on Taobao.