Putting Virtual Networking into the Fast Lane

img credit: dpdk.org
image credit: dpdk.org, linux foundation
Here's a write up on some of the dirty details that I've learned over the last year or so while building an NFV (network function virtualization) platform with the goal of virtualizing edge devices at scale. If you've ever wondered how to get usable, scalable performance out of virtualized networking drivers and appliances in production then hopefully you'll find this useful.

While deploying specialized, purpose built network hardware might still make sense to many organizations who require a certain level of scale and performance (read: layer 2-3), I'd like to explore for a moment the possibilities which stem from the proliferation of x86-based cloud platforms, namely: virtual network appliances and their capability to eliminate the rapidly less-sexy sound of network appliances being unwieldy racked & stacked in your edge cabs.

What must system owners consider before making a transition to virtual appliances for services such as firewall security, VPN and load balancing use cases? Do SR-IOV vNIC I/O drivers, DPDK packet I/O libraries, AES-NI and QuickAssist encryption instruction sets within virtual or bare-metal x86 solutions pack enough performance punch to drive a compelling argument? After all, it's barely a secret that many vendors are now relying on these back-end drivers themselves in their purpose built appliances that are, in essence, branded hypervisors.

Here I'm going to touch on some high-performance virtual networking challenges and how to solve for them using those modern technologies. The terminologies used here are primarily geared around OpenStack KVM hosts (where I happen to have experience), but the concepts should actually be general enough to be ported to whatever your cloud or hypervisor of choice is.

Virtual Network Drivers

Historically, virtual networking had been a non-starter due mostly to abysmal performance, and really due in large part because of issues that were solved long ago in the network hardware world (remember Cisco IP Input processing?). Let's review a few of the more popular network drivers that you'll likely find powering your VM networking today.


The vanilla virtio-net driver is quite the sandwich

The virtio network (virtio-net) driver is likely the most common VM network driver found in the hypervisor compute world today. However it's performance is not so good at handling large network forwarding workloads, so let's start here by analyzing the issues needing to be solved.

The big, obvious problem with virtio-net is that each packet that comes in (or out) of the VM translates to two, separate CPU interrupts:
  1. An interrupt for moving a packet from the Physical NIC to the vSwitch
  2. An interrupt for moving a packet from the vSwitch to the VM
This means that each time there’s a CPU interrupt, the VM has to stop whatever it was doing and go through a context switch, meaning it must move from user-space to kernel-space, grab the packet from memory and return with it back to user-space...and for each packet! Performance wise, this simply becomes untenable when traffic rates start to increase.

"The VM must move from user-space to kernel-space, grab the packet from memory and return with it back to user-space"

Besides the problem of constant context switching and VM exit/re-entry, QEMU 'tap' drivers sit in the data-path. This causes addition memory copy operations and therefor more latency.

None of this is really efficient for a VM which need to focus on just forwarding IP packets, so let's explore how to improve.


vhost-net allows a VM to move QEMU aside from the data-plane, improving queue memory copy operations

The first evolutionary step in VM network forwarding performance was introducing 'vhost-net' for the previously discussed virtio-net driver.

vhost-net helps the VM by moving part of the user space driver (QEMU) out of the way of the data-path. By reducing copy operations this lowers latency, CPU utilization and ultimately improves forwarding capabilities from VM to vSwitch.

However, we still have the problem of context switching and kernel drivers, so we need to do better!


DPDK goes a step further by removing the kernel drivers entirely from the data-path

The Data Plane Development Kit, developed initially by Intel but now its own thing, is the missing link in the network performance software equation. DPDK is a user space data forwarding library that works with most modern CPUs and NICs. Most importantly, it finally removes the need to context-switch with VM packet forwarding.

Two primary benefits provided by DPDK:
  1. Bypasses kernel drivers, SIGNIFICANTLY improving packet forwarding capabilities
  2. Introduces a "poll mode driver" that eliminates CPU interrupts
Developers must implement these libraries within their application and need at-least one vCPU dedicated just for packet polling (which will always run high util %, even when idle) but at which point the PMD (poll mode driver) thread utilization should go up, and mostly importantly appliance/application cores go down.

The important trade-off here is that developers need to incorporate the DPDK libraries into their applications to make use of it, but at which point could very easily enjoy this "Fast-Path” for high-speed packet processing (10GbE+). Also perhaps worth noting that any security-group functionality that you might've been relying upon in the soft switch (OVS, Linux Bridge) will need to be moved elsewhere - application, upstream, VM etc.

Other projects, such as Cisco's VPP (open-sourced as FD.io) are based on the DPDK principle.


SR-IOV uses a special kernel driver and capable physical network card to present a PCI device to the VM in a scalable "pass-through" manner known as a "VF", at which point the VM then gets to enjoy full packet forwarding capability of the physical NIC.

Similar to DPDK, because we're bypassing the kernel network driver (actually in SR-IOV's case, doing pass-through via a different driver) context switching and interrupts are no longer a worry but with a bonus that now our applications don't need specific DPDK library PMD support. Also similarly to DPDK, security enforcement that might've been occurring in the vSwitch is now bypassed and should be implemented elsewhere.

AWS EC2 advanced networking instances also support the the use of SR-IOV once you load the ixgbevf driver into the linux OS.

Some of the downsides to using SR-IOV is lack of portability, meaning that VF is likely not to be able to support live host migrations.

Compute Optimizations

Besides network forwarding optimizations, we also can benefit tremendously by some relatively easy system CPU and memory tasks.


Using non-uniform memory access (NUMA) memory design eliminates shared-bus (SMP) memory contention by introducing ‘pools’ and ‘nodes’ for fast VM memory access.

image credit: Cisco

There is a slower ‘system interconnect’ between NUMA nodes however, which can be a potential problem for multi-socket/node systems. For example, one such problem could present itself if a VM has vCPUs allocated across multiple nodes since the slower system bus is now the bottleneck. We can address this issue with better use of CPU pinning.

CPU Pinning

image credit: Cisco

VMs can be assigned 1 or more vCPUs that will then be ‘pinned’ (I.e. dedicated, not shared) to just specific VMs to take advantage of NUMA node placement. This avoids thread movement and processing restarts.

What’s a vCPU exactly? For example, let's presume we have a server with dual Intel E5-2680. That means two sockets with eight cores per socket and w/ hyper threading enabled 2 x 8 x 2 = 32 vCPUs.

In OpenStack, CPU pinning can be accomplished rather simply by modifying image flavors.

Huge Pages

Lastly, increasing the allocation of "huge pages" for a system improves memory searching by increasing the default page size in linux from 4k to 2Mb or 1G. This reduces the number of pages in the address table and can make searching far more efficient.

image credit: Cisco

Popular posts from this blog

Running ASA on Firepower 2100: An End-to-End Guide

Configuring Cisco ASA for Route-Based VPN

Up and Rawring with TRex: Cisco's Open Traffic Generator

GCP Network Design: The Basics