Meta today published a detailed technical account of how it has brought the AV1 video codec to the majority of mobile devices used in real-time video calls on Messenger and WhatsApp, the result of a multi-year engineering effort spanning codec selection, machine learning-based device eligibility, adaptive switching, and error-resilience mechanisms designed to maintain call quality on constrained networks.
The post, published June 22, 2026 on Meta's Engineering at Meta blog by a team of ten engineers - Yu-Chen (Eric) Sun, Jie Dong, Kewei Huang, Dave Jack, Joachim Reiersen, Phil Scherbel, Karthik Sekuru, Vertika Singh, Thileepan Subramaniam, and Wei Zhou - lays out the technical and operational steps required to move AV1 from high-end-only support in 2023 to majority mobile coverage today.
H.264/AVC (left) versus AV1 (right)
Why AV1 matters for calls
AV1, first standardised by the Alliance for Open Media in 2018, offers a clear bandwidth advantage over older codecs. According to Meta, offline testing showed at least a 20% bitrate reduction with AV1 compared with H.264/AVC under product settings on low-end and mid-range devices. Devices capable of handling higher encoding complexity see even larger gains.
That reduction is consequential in practice. For real-time communication products, video bitrates in real-world networks - especially in emerging markets - typically range from 10 kbps to 400 kbps. Maintaining good video quality below 100 kbps is difficult. Side-by-side tests in the Messenger app, using two Android phones each limited to 100 kbps, showed H.264/AVC producing noticeably blurry video against a much clearer AV1 stream at the same bitrate.
Beyond raw compression, AV1 includes two coding tools absent from H.264 at the main profile level: palette mode and intra-block copy. Palette mode addresses screen-sharing content, where pixel values typically cluster around a limited set of colors. Instead of encoding quantised transform-domain coefficients, the encoder signals the color clusters directly, reducing the data needed to represent text-heavy or UI-heavy frames. Intra-block copy extends that efficiency by allowing block prediction within the same frame, exploiting repetitive patterns common in screen content. Both tools matter as video calls increasingly involve screen sharing alongside live camera feeds.
The challenges that had to be solved first
Real-time communication (RTC) imposes constraints that do not apply to on-demand streaming. End-to-end video latency must stay below 300 milliseconds. Above that threshold, speakers perceive delay and conversations degrade. Multi-pass encoding techniques that improve quality in offline contexts introduce too much delay for RTC. Extensive decoder buffering also increases latency. Any sudden bitrate spike causes freezes on the receiving end.
Power consumption presented a separate obstacle. According to Meta, an offline experiment integrating an open-source AV1 encoder and measuring power on a Pixel 8 device found a 14% increase in power usage compared with H.264/AVC. That figure, by itself, makes straightforward AV1 adoption untenable for a product deployed on billions of mobile devices. Memory usage also increased with AV1 encoding, contributing to app crash regressions that further complicated mobile adoption.
Binary size was a third constraint. AV1 support, using libAOM as a reference, adds 1.7 MB to an application - 600 kB compressed. According to Meta, a 600 kB increase could consume an entire year's binary size budget for a large organisation. Binary size affects update success rates, application startup time, and downstream metrics including crash rates. A larger binary leaves more users on older app versions and delays call setup.
The low-complexity encoder and decoder selection
Meta addressed power and device coverage by adopting an internal low-complexity encoder implementation of AV1. The approach rests on a key observation: a newer codec does not necessarily require a higher-complexity encoder. Because AV1 supports a larger set of coding tools, a well-designed encoder has more opportunities to find better quality-complexity trade-offs. These trade-offs are referred to as presets.
The internal encoder offers multiple presets spanning from high to low complexity. An ultra-low-complexity preset was specifically developed, delivering encoding complexity comparable to H.264/AVC. With that preset available, Meta built a mechanism that adjusts the encoder preset based on device capabilities, enabling AV1 to reach a much broader range of devices than would otherwise be possible.
For decoding, Meta evaluated several open-source decoders and selected dav1d after A/B testing. The experiments showed superior power efficiency and reliability with dav1d, and also demonstrated an increase in call duration - a direct measure of power savings - compared with alternatives.
Binary size reduction required multiple parallel strategies. A dynamic-download framework was tried first, delivering AV1 as a separate component, but download failures from poor network conditions degraded the user experience enough to rule that approach out. Direct binary size optimisations followed. The quantisation matrix (QM) tool accounts for approximately 10% of the encoder library size; optimisation halved it, and removing QM entirely frees 60 kB of binary space. Meta also contributed size reduction improvements directly to the dav1d project. At the application level, codec libraries can be shared across features - such as video message transcoding - and platform codec support leveraged to avoid bundling additional libraries.
Machine learning-based device eligibility
Identifying which Android devices could run AV1 without degrading call quality proved harder than expected. The Android device landscape is large and fragmented. Simpler approaches based on memory, release year, and Android OS version all failed to produce reliably accurate eligibility lists.
Meta's response was an ML-based device eligibility framework using large-scale real-world statistical data rather than lab measurements. The model collects low-level performance statistical metrics through Meta's logging pipeline to assess each device's AV1 capability and outputs an rtc_score quantifying overall AV1 performance. That score informs decisions about whether a device should use AV1 and which call settings to apply.
The system was iterated through 2025. The first milestone, Model V1.1, rolled out in August 2025, broadening AV1 traffic across an increasing set of devices. That additional traffic created a dedicated AV1-only dataset that grew larger and more representative over time. Model V2 then introduced a two-tier approach distinguishing higher-end from lower-end devices, reflecting the reality that entry-level and flagship phones have very different AV1 encoding capabilities. According to Meta, each iteration substantially increased AV1 enablement across the device landscape, with improvement expected to continue as traffic grows and more data accumulates.
Adaptive codec switching at runtime
Device eligibility identifies capable devices at call setup, but it cannot account for runtime conditions. During A/B tests, Meta observed significant audio/video synchronisation problems caused by devices that could not encode or decode video in real time. A 2023 smartphone with an octa-core processor, for instance, failed to handle encoding at 320x180 at 15 frames per second. The issue appeared more prevalent with AV1 than H.264, and Meta suspects these devices throttle CPU frequency during calls, reducing their effective capability below what device specifications imply.
Three mechanisms address this at runtime. The first is adaptive encoder preset adjustment: a monitoring mechanism continuously tracks encoding latency and selects the appropriate preset. If encoding latency rises toward real-time limits, the encoder complexity drops. If the device sustains more headroom, complexity increases to improve quality.
The second is local device encoding latency-aware codec switching. If reducing the encoder preset is insufficient, the device switches from AV1 to H.264/AVC entirely. Both codecs are negotiated at call setup, and the client continuously monitors conditions. Preset adjustment and codec selection are decided jointly to avoid oscillation between modes.
The third mechanism addresses the peer device: peer device decoding latency-aware codec switching. Each device continuously feeds back its video decoding latency during the call. If the sender detects that the peer cannot decode AV1 in real time, it switches back to H.264/AVC. This is especially relevant when a high-end device calls a low-end device - the sender may encode AV1 easily while the receiver cannot decode it in real time.
Battery level also factors in. When battery is low, the system switches to H.264/AVC independently of encoding or decoding performance, extending call duration.
Asymmetric codec design
These mechanisms together enabled a further optimisation: an asymmetric codec design. Some mid-range Android devices cannot perform real-time AV1 encoding but can decode AV1 in real time. Under this design, those devices continue encoding and sending H.264/AVC while receiving AV1 from high-end peers. The result is meaningfully expanded AV1 coverage across the Android device landscape without requiring symmetric capability.
Rate control and VBV delay
With AV1 enabled across a broad device population, improving call quality within AV1 sessions became the next focus. The first major area was rate control.
In RTC, maintaining a constant bitrate (CBR) is important because instantaneous bitrate overshoot can cause congestion and freeze the video on the receiving end. Meta uses Video Buffering Verifier (VBV) delay as the key metric for evaluating CBR accuracy. The VBV is a leaky-bucket-based measurement simulating how an encoded stream would be buffered and played back at the decoder.
The mechanism is concrete. Assume a 100 kbps network allocation and a target encoding rate of 100 kbps. If the encoder produces a 20 kbit frame while 5 kbits from the previous frame remain in the buffer, transmitting the new frame takes at minimum (20 kbits plus 5 kbits) divided by 100 kbps, equalling 250 milliseconds. If the target VBV delay is below 200 milliseconds, that overshoot leads directly to higher latency, network congestion, or video freeze.
Several rate-control improvements followed from this metric. The encoder tracks VBV buffer status during encoding and uses it to guide bitrate allocation, reducing subsequent frame rates when overshoot occurs. Key frame bitrate is strictly controlled rather than boosted - common in offline encoders but counterproductive in RTC where bitrate spikes cause freezes. When the target bitrate drops sharply, the encoder must bring VBV delay back under control through that transition.
A more subtle problem also surfaced: undershoot. Early conservative rate allocation to avoid overshoot increased the tendency to undershoot, which misled bandwidth estimation, slowed bitrate ramp-up, and degraded video quality indirectly. The algorithm was revised to address undershoot alongside overshoot, improving accuracy in both directions.
AV1 also supports Reference Picture Resampling (RPR), which allows resolution changes without generating a key frame. Switching resolutions in H.264 typically requires a new key frame, causing a sudden bitrate spike and temporary freezing. RPR reduces that spike significantly.
Error resilience: temporal layers and long-term reference frames
Packet loss presents a distinct category of problem. When a packet is lost in RTC, the receiver sends a Negative Acknowledgement (NACK) and waits a round trip for retransmission. If that fails, the inter-frame dependency chain breaks and the video freezes. The receiver then requests a keyframe - which costs another round trip and is roughly 10 times larger than a typical P-frame - creating potential congestion and further packet loss, a difficult cycle to escape.
Two features in AV1 address this: temporal layers (TL) and Long-Term Reference (LTR) frames.
Temporal layers organise frames into a time-based hierarchy. The base layer, temporal layer 0, provides a lower frame rate independently. Enhancement layers add intermediate frames to reach higher frame rates when conditions allow. The critical property is that the base layer maintains continuity without depending on enhancement-layer frames. If enhancement-layer packets are lost or arrive late, decoding continues from the base layer without stalling.
Meta applies this by prioritising FEC protection for base-layer data and treating enhancement-layer retransmissions more conservatively. When round-trip time (RTT) is low, retransmitting a missing enhancement packet is worthwhile. When RTT is high, skipping that retransmission avoids delay without breaking the decode flow. Temporal layers are not always active: because a TL structure is less compression-efficient than a tightly dependent prediction chain, Meta enables TL adaptively, activating when loss rises and deactivating once conditions stabilise.
Long-Term Reference frames provide a different recovery path. The encoder stores selected reference frames in a bounded buffer of size 4 and emits LTR-predicted (LTRP) frames on request. When the decoding chain breaks due to frame loss, an incoming LTRP frame - predicted from a previously decoded LTR frame - instantly resynchronises sender and receiver without requiring a full keyframe.
Implementation requires close coordination between the encoder and the network layer. An explicit LTR indicator is carried in a proprietary RTP header extension, and the frame ID is exposed through LTR bitstream syntax. ACK feedback is sent via a separate proprietary RTP header extension containing the corresponding frame ID, so the sender knows precisely which LTR was received. This differs from H.264, where the network layer can parse the slice header directly to recognise an LTR frame.
The network layer requests LTRP frames in two ways. Reactively, when the receiver experiences a freeze and sends an RPSI (Reference Picture Selection Indication). Proactively, when the sender detects elevated packet loss through a feedback channel and begins sending LTRP frames periodically. The proactive path is somewhat redundant but significantly improves reliability and reduces freeze events. According to Meta, to maintain LTR quality as references age, the encoder already emits periodic slightly higher-quality frames to improve overall call quality - those frames are simply marked as LTR, keeping the reference quality high over time.
Group calls and hardware AV1
According to Meta, extending AV1 to group calls remains in progress. Unlike one-to-one calls, group call participants must decode multiple simultaneous video streams, making expanded AV1 coverage considerably harder to achieve. Software AV1 implementations support steady expansion for individual streams, but higher quality and improved features in the group call context will likely require hardware AV1 support on end-user devices.
Meta stated directly: "We encourage SoC vendors to invest in HW AV1 across all device tiers to meet the AV1 requirements to deliver an improved viewer experience, device battery savings and enhanced network operator infrastructure efficiency."
Context for the advertising and marketing community
The technical work described today connects to several trends relevant to advertisers and marketing professionals. Meta's advertising platforms operate across Messenger and WhatsApp, both of which are growing advertising surfaces. As PPC Land has reported, WhatsApp exceeded 3 billion monthly active users and Messenger more than a billion - platforms where video quality directly affects the environment in which ads are seen.
This AV1 RTC work also sits within Meta's broader video codec strategy. As PPC Land reported in November 2025, Meta's engineering team implemented Dolby Vision HDR support for Instagram video in AV1, requiring collaboration with FFmpeg developers to carry Dolby Vision metadata within AV1 bitstreams. The dav1d decoder selected for RTC was among the components improved in that work. Both tracks - HDR for on-demand video and AV1 for real-time calls - reflect Meta's sustained commitment to AV1 as its primary codec across the product portfolio.
The IAB Tech Lab published CTV ad format guidelines in December 2025 recommending AV1, H.265/HEVC, and VP9as preferred formats over H.264 for video ad delivery. Meta's large-scale AV1 deployment on mobile RTC adds evidence that AV1 performs in bandwidth-constrained environments, not just premium connected television - a context relevant to advertisers planning video creative for social and messaging environments. Bloomberg's decision to join the Alliance for Open Media in December 2024 reflected a growing institutional support base for the AV1 standard; Meta's engineering disclosure today demonstrates what operating that standard at billions-of-devices scale actually requires.
Timeline
- 2018: AV1 first standardised by the Alliance for Open Media (AOMedia).
- 2023: Meta introduces AV1 for real-time video calls on high-end mobile devices in Messenger and WhatsApp.
- December 2024: Bloomberg joins the Alliance for Open Media, as reported by PPC Land.
- April 2025: Apple adds AV1 hardware support via Safari 18.4 WebRTC improvements, as covered by PPC Land.
- June 2025: All AV1 encodings on Instagram derived from iPhone-produced HDR begin including compressed Dolby Vision metadata.
- August 2025: Meta's ML device eligibility Model V1.1 rolls out, broadening AV1 traffic across a wider set of Android devices.
- November 2025: Meta announces Dolby Vision HDR for Instagram iOS using AV1, requiring dav1d decoder work. PPC Land coverage.
- December 2025: IAB Tech Lab publishes CTV ad format guidelines recommending AV1 for video ad delivery. PPC Land coverage.
- 2025 - ongoing: Model V2 introduced with two-tier device classification; AV1 coverage expanded substantially across the Android device landscape.
- June 22, 2026: Meta publishes detailed technical post describing AV1 at majority mobile coverage on Messenger and WhatsApp RTC, covering the low-complexity encoder, ML eligibility, adaptive switching, VBV rate control, temporal layers, and Long-Term Reference frames.
Summary
Who: Meta's engineering team, credited to ten engineers including Yu-Chen (Eric) Sun and Jie Dong, working on real-time communication infrastructure for Messenger and WhatsApp.
What: A multi-year effort to enable the AV1 video codec on the majority of mobile devices used for real-time video calls on Messenger and WhatsApp, combining a low-complexity encoder, machine learning-based device eligibility, adaptive codec and preset switching, VBV-based rate control, temporal layers, and Long-Term Reference frames.
When: The engineering post was published today, June 22, 2026. AV1 was first introduced for high-end RTC devices in 2023; ML device eligibility Model V1.1 rolled out in August 2025; majority mobile coverage is the cumulative outcome of that progression.
Where: Deployed across Meta's Messenger and WhatsApp applications on Android and iOS, with the engineering work published on Meta's Engineering at Meta blog.
Why: AV1 delivers at least 20% bitrate reduction versus H.264/AVC at equivalent quality on low-end and mid-range devices, improving video call quality for users on bandwidth-constrained networks - particularly in emerging markets where real-world video bitrates for RTC products typically range from 10 kbps to 400 kbps.
Discussion