Introduction

Many industry observers believe AI is having its “iPhone moment” with the rapid rise of emerging large language models and generative AI applications such as OpenAI’s ChatGPT. A big part of what sets generative AI applications apart is the sheer number of parameters they manage. Some of these applications process billions, or even trillions of parameters due to the Large Language Models (LLMs) they use. As a result, generative AI workloads require large clusters of very high-end servers equipped with thousands of GPUs, TPUs, and other accelerated processors between them. Moreover, the massive volume of traffic between these servers requires a data center-scale fabric built on non-standard network infrastructure, with support for technologies like Remote Direct Memory Access (RDMA).  RDMA reduces performance overhead by enabling data to be copied from one server’s memory to another, completely bypassing the operating system’s network stack. 

The cost to build and maintain the specialized server and network infrastructure to support generative AI is enormous, and unique skillsets are required, making it impractical for all but the most well-funded AI vendors. The cost of GPUs alone is significant, with Nvidia A100 chips at $10,000 apiece. However, a recent development is making AI accessible to businesses of all sizes, enabling AI’s iPhone moment to become more of a reality: the advent of public cloud-based AI services such as Amazon Bedrock, Microsoft Azure AI and Google AI. These offerings are not without their costs, but they eliminate the huge expense of building and maintaining a separate infrastructure for hosting generative AI platforms. Enterprise developers can use prebuilt configurations and models to test and deploy AI applications on a pay-as-you-go basis, with access to a virtually unlimited pool of resources to scale up or down as needed.

For example, Amazon Bedrock is a fully managed service for AWS users that makes foundation models (FMs) from leading AI companies available through a single API. FMs are very large machine learning models pre-trained on vast amounts of data. According to Amazon, the flexibility of their FMs makes them applicable to a wide range of use cases, powering everything from search to content creation to drug discovery.

AI’s Performance Challenges Go Beyond Massive Data Volumes

Even with new cloud services like Amazon Bedrock, the performance challenges go beyond the massive data volumes that characterize AI workloads. AI applications, and the virtualized environments they are typically deployed in generate enormous amounts of packet delay variation(PDV), more commonly referred to as jitter.

AI application models adapt in real-time to improve responses based on new data and interactions as they occur, leading to unpredictable changes in packet transmission rates. Moreover, many AI applications are comprised of containerized microservices distributed across multiple servers at cloud and edge locations. While it’s often desirable to move some data and processing to the edge to improve response times and reduce bandwidth usage, network hops between centralized cloud and edge environments also go up. In addition, unpredictable bursts of traffic can result from the frequent synchronization of data models and configurations required between edge and cloud components to maintain consistency and reliability. 

Jitter that results from AI application behavior is compounded in virtualized environments by competition for resources between cloud-hosted applications that drives random VM scheduling conflicts, hypervisor packet delays, and hops between virtual and physical subnets. This virtualization jitter, is compounded by fading and RF interference from last-mile mobile and Wi-Fi networks that cloud vendors have no control over.    

Another factor contributing to jitter is that AI applications frequently make use of 5G networks to take advantage of the high data volumes and low latency they support. 5G’s smaller cells, higher frequencies and mmWave technology have poorer propagation characteristics than LTE, causing signals to fade in and out. Moreover, 5G signals often require a clear line-of-sight path between transmitter and receiver. Any obstacle can cause signals to be reflected, refracted, or diffracted, resulting in multiple signal paths with different lengths and different transmission times, leading to variation in packet delivery times. 5G networks can use various technologies to address these sources of jitter, such as beamforming, which directs the signal more precisely towards the receiver, and MIMO (Multiple Input Multiple Output), which uses multiple antennas at both the transmitter and receiver ends to improve signal quality and reduce the effects of multipath interference.  However, these technologies only mitigate the impact of jitter, they don’t eliminate it.

Moreover, 5G’s small cell architecture has much heavier infrastructure requirements than LTE.  This has driven many network providers to the cloud to reduce costs. However, the shift to cloud-native 5G networks adds virtualization jitter to that already generated by 5G due to its architecture.

Jitter’s Serious Knock-On Effect

Jitter has a far more serious knock-on effect on cloud network throughput and hosted application performance than the random delays outlined above that add latency.  This knock-on effect can render AI applications, especially those requiring real-time or near-real-time responsiveness virtually unusable, and even dangerous if critical systems are involved. 

TCP, the network protocol widely used by public cloud services such as AWS and MS Azure consistently treats jitter as a sign of congestion. To guarantee packet delivery and prevent data loss, TCP responds to jitter by retransmitting packets and throttling traffic, even when plenty of bandwidth is available. Just modest amounts of jitter can cause throughput to collapse and applications to stall.

TCP’s response to jitter has become a leading and increasingly common cause of poor network performance.   For cloud users, it causes more than network bandwidth to be wasted. Cloud vendors typically charge egress fees based on the amount of data transferred from their network. Whether it’s original data or packet retransmissions caused by TCP’s reaction to jitter, costs will be incurred.

TCP’s response to jitter is triggered in the network transport layer by its congestion control algorithms (CCAs).  However, the solutions network administrators typically use to address performance problems caused by jitter, like increasing bandwidth, and using jitter buffers, have no impact on TCP’s CCAs, and in some cases can make their response to jitter worse.  Increasing bandwidth is just a temporary fix; as network traffic grows to match the added bandwidth, the incidence of jitter-induced throughput collapse increases in tandem.  

Jitter buffers, commonly used to mitigate jitter’s effect on network and application performance can sometimes exacerbate the issue. Jitter buffers work by reordering and realigning packets for consistent timing before delivering them to an application. However, packet reordering and realignment introduces additional, often random delays, which can worsen jitter and negatively impact performance for real-time applications like live video streaming. 

QoS techniques can offer some benefit by prioritizing packets and controlling the rate of data transmission for selected applications and users. But performance tradeoffs will be made, and QoS does nothing to alter TCP’s behavior in response to jitter. In some cases, implementing QoS adds jitter, because packet prioritization can create variable delays for lower priority application traffic.

TCP optimization solutions that do focus on the CCAs rely on techniques such as increasing the size of the congestion window, using selective ACKs, adjusting timeouts, etc. However, improvements are limited, generally in the range of 10-15%, because these solutions like all the others don’t address the fundamental problem –TCP’s CCAs have no ability to determine whether jitter is due to congestion, or other factors like application behavior, virtualization, or wireless network issues.

RDMA technologies avoid the problem because they bypass the TCP stack. However, RDMA requires non-standard network infrastructure as noted previously, and only works well at LAN-scale within the same data center. To extend RDMA’s capabilities beyond the LAN, iWARP (Internet Wide Area RDMA Protocol) can be used. However, iWARP encapsulates RDMA operations within standard TCP packets. Although this removes the requirement for specialized network infrastructure, it makes iWARP traffic subject to TCP’s response to jitter.

Apparently, this is not a trivial problem to overcome. MIT Research recently cited TCP’s CCAs as having a significant and growing impact on network performance because of their response to jitter, but was unable to offer a practical solution.1 It can only be resolved by modifying or replacing TCP’s CCAs to remove the bottleneck they create. But to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications that rely on it. It must also co-exist with ADCs, SD-WANs, VPNs and other network infrastructure already in place

There is A Proven and Cost-Effective Solution

Badu Networks’ patented WarpEngineTM carrier-grade optimization technology, with its single-ended transparent proxy architecture meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine determines in real-time whether jitter is due to network congestion, and prevents throughput from collapsing and applications from stalling when it’s not. It builds on this capability with other performance enhancing features like improved flow control and QoS features such as packet prioritization that benefit not only TCP, but also GTP, UDP and other network traffic.  As a result, WarpEngine is able to deliver massive performance gains for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes. WarpEngine can be deployed at core locations as well as the network edge as a hardware appliance, or as software that can be installed on servers provided by the customer. It can be installed in a carrier’s core network, or in front of hundreds or thousands of servers in a corporate or cloud data center. WarpEngine can also be deployed at cell tower base stations, or with access points supporting public or private Wi-Fi networks of any scale. Enterprise customers can implement WarpEngine on-prem with their Wi-Fi access points, or at the edge of their networks between the router and the firewall for dramatic WAN, broadband and FWA throughput improvements of 2-10X or more.2

WarpVMTM, the VM form factor for WarpEngine, is designed specifically for cloud and edge environments. WarpVM is a VM-based transparent proxy that installs in minutes in AWS, MS Azure, VMWare, or KVM environments. No modifications to client or server applications or network stacks are required. WarpVM has also been certified by NutanixTM for use with their AHV hypervisor.3 AHV enables virtualization for Nutanix’s multicloud platform, and supports their recently announced GPT-in-a-BoxTM AI solution.

Functioning as a virtual router VNF with WarpEngine optimization built in enables WarpVM to boost cloud network throughput and hosted VM or container-based application performance similar to those cited above, especially in high traffic, jitter-prone network environments like AI.2  WarpVM achieves these results with existing infrastructure at a fraction of the cost of budget-busting cloud network and server upgrades.

Comparing the Cost of WarpVM to the Cost of Cloud Network and Server Upgrades

If you’re considering taking advantage of any of the managed AI services from vendors such as AWS, or considering upgrades to improve performance for cloud-native 5G or any cloud or edge hosted applications already deployed, you should understand how the cost of implementing WarpVM compares to the cost of cloud server and network upgrades to achieve the same result.  

As an example, assume an AWS customer currently paying for a Direct Connect port with 10G capacity wants to improve cloud network throughput by WarpVM’s average of 3X.  To do thisthey would typically pay for two additional AWS Direct Connect 10G ports and balance load across each at 30% utilization to allow headroom for peak traffic.  This approach is required in many cases to provide excess bandwidth to accommodate TCP’s reaction to jitter.  This means the customer is leaving network resources underutilized and wasting money most of the time.  The customer would also pay for additional AWS servers to support the each of the new Direct Connect ports.

To improve throughput by 3X using WarpVM in this scenario, the customer would only pay for one instance of WarpVM at a cost that’s nearly 80% less than the standard approach described above. The total savings are actually much greater, because the underlying problem of jitter-induced throughput collapse has been resolved; no additional AWS servers are needed for load balancing to allow for peak traffic, no egress fees will be incurred due to unnecessary packet retransmissions. Moreover, network bandwidth that was previously wasted has been recaptured, making it less likely that another round of upgrades will be required in the near future.

Likewise, cloud vendors embedding WarpVM in their services can avoid the cost of provisioning additional infrastructure to provide the equivalent of a major upgrade. They could choose to pass some or all the savings along to their customers, and gain an immediate edge in terms of price and performance.  WarpVM would also enable cloud vendors to onboard more customers and generate greater revenue with their existing infrastructure footprint.

Conclusion

As AI continues to drive innovation and transformation across industries, jitter-related performance challenges will only grow. They must be effectively addressed to prevent expensive network bandwidth and server capacity from being wasted. Failing to do so will put the cost of AI out of reach for too many organizations, and bring its iPhone moment to an end. 

Badu Networks’ WarpVM offers a proven, cost-effective solution for overcoming these challenges, for AI and any other cloud use cases. By tackling TCP’s reaction to jitter head-on at the transport layer, and incorporating other performance enhancing features that benefit TCP, GTP, UDP and other network traffic, WarpVM ensures that your AI, cloud-native 5G and other cloud and edge applications operate at their full potential for the lowest possible cost.

To learn more about WarpVM and request a free trial with your AI, cloud-native 5G, or other cloud applications to see how much you can save, click the button below. 

Notes

1. Starvation in End-to-End Congestion Control, August 2022: https://people.csail.mit.edu/venkatar/cc-starvation.pdf

2. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf

3. https://www.nutanix.com/partners/technology-alliances/badu-networks