Abstract. NAT gateway is an important network system in today's IPv4 network when translating a private IPv4 address to a public address. However, traditional NAT system based on Linux Netfilter cannot achieve high network throughput to meet modern requirements such as data centers. To address this challenge, we improve the network performance of NAT system by three ways. First, we leverage DPDK to enable polling and zero-copy delivery, so as to reduce the cost of interrupt and packet copies. Second, we enable multiple CPU cores to process in parallel and use lock-free hash table to minimize the contention between CPU cores. Third, we use hash search instead of sequential search when looking up the NAT rule table. Evaluation shows that our Quick NAT system obtains very high scalability and line-rate throughput on commodity server, with an improvement of more than 860% compared to Linux Netfilter.
1 Introduction
The scale of Internet is ever increasing. Today there are more than 1 billion hosts connected to Internet. The total number of Internet users exceeds 3.5 billion, which is nearly half of the entire population of the world, according to ITU [1], The World Bank [2] and Statista [3]. Since IPv4 addresses are exhausted in 2011 [4], the move to IPv6 seems inevitable. However, the transition from IPv4 to IPv6 requires updating not only the Internet infrastructure but also a large amount of applications, which faces many obstacles in practice. As a result, IPv4 network and IPv4 users are still the dominant in today's Internet. In IPv4 network, the key technology to deal with the address insufficiency problem is NAT (Network Address Translation). By NAT technology, many IPv4 devices/users with private IPv4 address share the same public IPv4 address to reduce the total number of required public IPv4 addresses. The NAT gateway translates an IPv4 private address to an IPv4 public address and vice versa. Since NAT gateway needs to rewrite every packet, its throughput/delay performance will significantly affect the end-to-end performance of the flows crossing it, particularly for large campuses/companies like data centers with high-volume traffic. Although there are prior works focusing on how to improve the NAT performance, in this paper our interest lies in how to systematically solve the problem with the emerging new technologies, such as the multi-core platform, user-space network processing, etc. Actually, this kind of technology is not only important for NAT gateways, but also will help similar systems like NAT-PT, which will be important in future when IPv4 and IPv6 networks co-exist in the Internet [5].
In the past, most NAT systems are deployed in the Linux platform leveraging the Netfilter framework [6]. Although it may work in small-scale networks, its performance faces significant challenge with high traffic volume. Specifically, for small-sized packets, the throughput of NAT system on commodity servers can hardly exceed 1Gbps, which leads to a big gap between the system performance and the hardware capability with 10G/100G NIC cards and multiple CPU cores. In this work we try to improve the performance of NAT system by the following approaches. First, we leverage the Data Plane Development Kit (DPDK)'s capabilities to build NAT system in the user space instead of in the Linux kernel, and thus enable polling the NIC to read packets directly into user space to eliminate the high overhead caused by packet copy and interrupt. But we also need to manipulate the packet through pointers to achieve zero-copy in the process of NAT. Second, to leverage the multi-core capability of modern commodity servers, we enable Receive-side Scaling (RSS) to let multiple cores process packets in a parallel way. But we need to minimize the sharing cost between CPU cores. Third, we find that the algorithms used in today's NAT system can also be improved. In particular, we use hash based search instead of sequential search when looking up the NAT rule table, which also considerably helps improve the performance.
Based on the improvements above, we implement a NAT system called Quick NAT. Our experiments show that Quick NAT can obtain line rate throughput of 10Gbps for 64 byte packets, an improvement of more than 860% compared to Linux Netfilter. In fact, it can achieve even higher throughput on servers with more CPU cores and network interface cards (NICs) of higher speed.
The rest of the paper is organized as follows. Section II provides an overview of background and related work. Section III describes system architecture of Quick NAT and elaborates the methods we use to build Quick NAT. Section IV shows experiment results and we conclude the paper in section V.
2 Background and Related Work
2.1 Background
For commodity servers, NAT function is commonly achieved by Netfilter in Linux Kernel. The Netfilter is a packet filter framework composed of a set of hooks, which NAT registers with as build-in chains to modify packet's header. The figure below shows chains of NAT in Netfilter:
相关阅读
赞助商广告