In a traditional cloud computing implementation, data generated at the edge of the network is sent to centralized cloud servers for processing. These servers can be located anywhere in the world, often in data centers far from the data source. This model works well for applications that require substantial processing power and can tolerate the latency involved in transmitting data back and forth over long distances. 

However, the centralized model doesn’t work so well for real-time or near-real-time applications such as IoT, content delivery networks (CDNs), streaming services, AR, VR, autonomous vehicles, and the cloud-native 5G networks that often support them. That’s where edge computing comes in.  By moving data, processing and storage closer to users and devices at the edge of the network, latency, bandwidth usage, and application response times can be greatly reduced. Even though edge clouds are typically integrated with centralized cloud environments, data movement between them tends to happen on a periodic basis, and generally only subsets or summarized versions of the data are sent from the edge to the cloud.  

In theory, edge deployments should also help to minimize the performance impact of packet delay variation (PDV), more commonly referred to as jitter, as fewer hops are needed between different points on the network. However, theory and reality don’t always line up.  In some ways, jitter can be even more prevalent at the edge than in a centralized cloud. There are three reasons for this: (1) the types of applications edge computing typically supports; (2) the nature of the wireless and cloud-native 5G networks these applications rely on; and (3) the application architectures employed. 

The real-time and near real-time applications typically deployed at the edge such as IoT and streaming are jitter generators.  They are likely to transmit data in unpredictable bursts with variable payload sizes, resulting in irregular transmission and processing times. In the case of IoT, these effects are multiplied as devices move around, and more devices are added to a network. 

Jitter caused by random delays resulting from application behavior is compounded by jitter from the RF interference and signal degradation that frequently affects the last-mile wireless networks these applications rely on. Cloud-native 5G networks that increasingly support real-time applications at the edge compound this jitter further because of  characteristics inherent in 5G technology, such as:

  • Higher Frequencies and mmWave technology that have poorer propagation characteristics and are more susceptible to interference and signal degradation than LTE, which can lead to increased jitter.

  • Denser Networks that create opportunities for devices to switch base stations more frequently, resulting in jitter.

  • The requirement for a clear line-of-sight path between the transmitter and receiver. Any obstacle can cause the signal to be reflected, refracted, or diffracted, resulting in multiple signal paths with different lengths. These varying path lengths can cause packets to arrive at different times, creating jitter.

In addition, application architectures based on containerization and microservices have been widely adopted for both centralized and edge cloud deployments, and cloud-native 5G networks make use of them. Containerized applications can load much faster and avoid the VM conflicts and hypervisor packet delays of VM-based ones. However, there’s still competition for virtual and physical resources in the cloud or at the edge, and some jitter will result from it. Moreover, it’s common to run containers inside VMs to get the best of both worlds – the isolation and security benefits of VMs, and the efficiency and portability of containers. In this type of deployment, the hypervisor manages the VMs, and within each VM an orchestration system like Kubernetes manages the containers. Thus, VM conflict and hypervisor packet delays can still be factors in generating jitter.  

Furthermore, execution of these applications can involve complex interactions between multiple containerized microservices, each running in its own container, and potentially spread across multiple VMs and physical locations. This increases the number of network hops, and thus the potential points at which random delays (i.e., jitter) can crop up.  This impacts application as well as network performance, since the virtualized (VNFs) or containerized network functions (CNFs) that comprise a 5G network can also be impacted. 

Jitter has a far more serious knock-on effect on performance beyond the random delays that cause it. Widely used network protocols such as TCP consistently interpret jitter as a sign of congestion, and respond by retransmitting packets and slowing traffic to prevent data loss, even when the network isn’t saturated and plenty of bandwidth is available. Just modest amounts of jitter can cause throughput to collapse and applications to stall, or in the case of VNFs, disrupt the network services they provide in a cloud-native 5G network.  And not only TCP traffic is affected. For operational efficiency, applications using TCP generally share the same network infrastructure and compete for bandwidth and other resources with applications using UDP and other protocols. More bandwidth than would otherwise be needed is often allocated to applications using TCP to compensate for its reaction to jitter, especially under peak load. This means bandwidth that could be available for applications using UDP and other protocols is wasted, and the performance of all applications sharing a network suffers.

Most Network Performance Solutions Fall Short or Make the Problem Worse

TCP’s reaction to jitter is triggered by its congestion control algorithms (CCAs) which operate in the network transport layer (layer 4 of the OSI stack). The solutions network administrators generally rely on to address poor cloud and edge network and application performance either don’t operate at the transport layer, or if they do, they have little or no impact on TCP’s CCAs. As a result, these solutions – upgrades, Quality of Service (QoS), jitter buffers and TCP optimization – fail to address the root cause of jitter-induced throughput collapse, and sometimes make it worse:

  • Network bandwidth upgrades, in addition to being costly and disruptive, are a physical layer 1 approach that offers only a temporary solution. Traffic eventually increases to fill the additional capacity, and the incidence of jitter-induced throughput collapse goes up in tandem because the root cause was never addressed.

  • QoS techniques such as packet prioritization, traffic shaping and bandwidth reservation operate at the network layer (layer 3) and the transport layer (layer 4) primarily because they rely on the IP addresses and port numbers managed at those layers to prioritize traffic and avoid congestion. However, TCP’s CCAs that also operate at the transport layer are not dealt with.  As a result, the effectiveness of QoS is limited in dealing with jitter-induced throughput collapse. In some cases implementing QoS adds jitter, because the techniques it uses such as packet prioritization can create variable delays for lower priority traffic.

  • When network administrators identify jitter as a factor in lagging performance, they often turn to jitter buffers to resolve it. However, jitter buffers do nothing to prevent throughput collapse, and can even make it worse. TCP’s reaction to jitter occurs in the transport layer, whereas jitter buffers are an application layer solution that reorders packets and realigns packet timing to adjust for jitter before packets are passed to an application. The random delays created by packet reordering and realignment can ruin performance for real-time applications, and become yet another source of jitter contributing to throughput collapse.

  • TCP optimization solutions do focus on the transport layer and the CCAs. They try to address the bottleneck created by TCP’s CCAs by managing the size of TCP’s congestion window to let more traffic through a connection, using selective ACKs that notify the sender which packets need to be retransmitted, adjusting idle timeouts and tweaking a few other parameters. While these techniques can offer some modest improvement, generally in the range of ten to fifteen percent, they don’t eliminate jitter-induced throughput collapse, the resulting waste of bandwidth, or its impact on UDP and other traffic sharing a network.

Apparently, jitter-induced throughput collapse is not an easy problem to overcome. MIT Research recently cited TCP’s CCAs as having a significant and growing impact on network performance because of their response to jitter, but offered no practical solution. 1

Jitter-induced throughput collapse can only be resolved by modifying or replacing TCP’s CCAs to remove the bottleneck they create, regardless of the network or application environment. However, to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications. It must also co-exist with ADCs, SDNs, VPNs, VNFs, CNFs  and other network infrastructure already in place.

There’s Only One Proven and Cost-Effective Solution

Only Badu Networks’ WarpEngineTM carrier-grade optimization technology, with its single-ended proxy architecture meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine determines in real-time whether jitter is due to network congestion, and prevents throughput from collapsing and applications from stalling when it’s not, recapturing bandwidth that would otherwise be wasted. WarpEngine builds on this with other performance enhancing features that benefit not only TCP, but also GTP, UDP and other traffic. These capabilities enable WarpEngine to deliver massive throughput improvements ranging from 2-10X or more for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes.2. It achieves these results with existing network infrastructure at a fraction of the cost of upgrades.

WarpEngine can be deployed as a hardware appliance in a service provider’s core network, in front of thousands of servers in a corporate or cloud data center, at cell tower base stations, or with access points supporting public or private Wi-Fi networks of any scale. Customers can also deploy it at branch locations at the edges of their network for dramatic WAN, broadband and satellite network throughput improvements.

WarpVMTM , the VM form factor of WarpEngine, is designed specifically for virtualized environments. WarpVM is deployed as a VNF that acts as a virtual router with WarpEngine’s capabilities built in, optimizing all traffic coming in and out of a cloud or edge environment such as a VPC supporting a 5G core network. WarpVM can boost cloud and edge network throughput, as well as VM and container-hosted application performance to achieve similar results to those cited above.

WarpVM’s transparent proxy architecture enables it to be deployed in minutes in AWS, Azure, VMWare, or KVM cloud or edge environments. WarpVM has also been certified by Nutanix  for use with its multi-cloud platform.3   No modifications to network stacks, or client or server applications are required. All that’s needed are a few DNS changes at the client site, or simple routing changes at the cloud.

To learn more about WarpVM and request a free trial click the button below. 

Notes

1.Starvation in End-to-End Congestion Control, August 2022: https://people.csail.mit.edu/venkatar/cc-starvation.pdf

2. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf

3. https://www.nutanix.com/partners/technology-alliances/badu-networks