efa
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
Linux kernel driver for Elastic Fabric Adapter (EFA) ==================================================== Overview ======== Elastic Fabric Adapter (EFA) is a network device that provides reliable userspace communication and kernel bypass capabilities, targeting more consistent latency and higher throughput than traditional TCP-based communication. EFA is first implemented in AWS EC2 instances, and is optimized to cloud-scale network infrastructure. EFA brings the scalability, flexibility, and elasticity of cloud to tightly-coupled applications like HPC and Machine Learning Training, that would benefit from the lower and more consistent latency and higher throughput. Applications would use rdma-core (https://github.com/linux-rdma/rdma-core) as the userspace library to use EFA. EFA supports datagram send/receive operations and can support RDMA read/write operations on some of the devices. EFA supports unreliable datagrams (UD) as well as a new Scalable (unordered) Reliable Datagram protocol (SRD). SRD provides support for reliable datagrams and more complete error handling than typically seen with other Reliable Datagram (RD) implementations, but, unlike RD, it does not support ordering or segmentation. EFA depends on having ib_core and ib user verbs compiled with the kernel. User verbs are supported via a dedicated userspace EFA provider in rdma-core, and kernel verbs are supported through the standard ib_verbs interface combined with some additional EFA extensions that must be included separately. Driver distribution =================== In addition to this repository, EFA driver can be found in upstream Linux kernel and in various Linux distributions' kernel trees, e.g. Amazon Linux, Ubuntu and RHEL. As a general approach, features and bug fixes that are relevant and suitable for mainline kernel, are being updated in both repositories around the same time. In addition to receiving bug fixes from stable kernel trees, some Linux distributions may backport EFA driver from more advanced kernel trees into their kernels. In Amazon Linux case, kernels are expected to have up-to-date EFA driver shortly after it is released in this repository. Driver compilation ================== For list of supported kernels and distributions, please refer to: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-amis Prerequisites: Kernel must be compiled with CONFIG_INFINIBAND_USER_ACCESS in Kconfig. sudo yum update sudo yum install gcc sudo yum install kernel-devel-$(uname -r) Compilation: Run: mkdir build cd build cmake .. make efa.ko is created inside the src/ folder. EFA supports PCIe peer-to-peer memory access between devices. Currently supported peer-to-peer architectures are: * GPUDirect RDMA (GDR) * NeuronLink RDMA Peer-to-peer support can be disabled by running cmake with the following parameter: cmake -DENABLE_P2P=0 .. For more information regarding GPUDirect RDMA, visit: https://docs.nvidia.com/cuda/gpudirect-rdma/index.html Kernel verbs support can be disabled by running cmake with the following parameter: cmake -DENABLE_KVERBS=0 .. To build EFA RPMs run `make` in the rpm/ folder. Your environment will need to be setup to build RPMs. The EFA RPM will install the EFA kernel driver source, setup DKMS in order to build the driver when the kernel is updated, and update the configuration files to load EFA and its dependencies at boot time. Driver installation =================== Loading driver -------------- modprobe ib_core modprobe ib_uverbs insmod efa.ko For automatic driver start upon the OS boot sudo vi /etc/modules-load.d/efa.conf insert "efa" to the file copy the efa.ko to /lib/modules/$(uname -r)/ sudo depmod If previous driver was loaded from initramfs - it will have to be updated as well (i.e. dracut) Restart the OS (sudo reboot and reconnect) Supported PCI vendor ID/device IDs ================================== 1d0f:efa0 - EFA used in EC2 virtualized and bare-metal instances. 1d0f:efa1 - EFA used in EC2 virtualized and bare-metal instances. 1d0f:efa2 - EFA used in EC2 virtualized and bare-metal instances. EFA Source Code Directory Structure (under src/) ================================================ efa_main.c, efa.h - Main Linux kernel driver. efa_verbs.c - Control verbs implementations. efa_data_verbs.c - Data path verbs implementations. efa_com.[ch], efa_com_cmd.[ch] - Management communication layer. This layer is responsible for the handling all the management (admin) communication between the device and the driver. efa_common_defs.h - Common definitions for efa_com layer. efa_admin_defs.h, efa_admin_cmd_defs.h - Definition of EFA management interface. efa_regs_defs.h - Definition of EFA PCI memory-mapped (MMIO) registers. efa_io_defs.h - Definition of EFA datapath types. efa_sysfs.[ch] - Sysfs files. efa_verbs.h - EFA extension to ib_verbs.h intended for ULPs' use. efa-abi.h - Kernel driver <-> Userspace provider ABI. efa_p2p.[ch] - Peer-to-peer memory layer implementation. efa_gdr.c - GPUDirect RDMA implementation. nv-p2p.h - NVIDIA GDR API. efa_neuron.c - Neuron implementation. neuron_p2p.h - Neuron P2P API. Management Interface ==================== EFA management interface is exposed by means of: - PCIe Configuration Space - Device Registers - Admin Queue (AQ) and Admin Completion Queue (ACQ) - Asynchronous Event Notification Queue (AENQ) AQ is used for submitting management commands, and the results/responses are reported asynchronously through ACQ. EFA introduces a small set of management commands. Most of the management operations are framed in a generic get/set feature command. The following admin queue commands are supported: - Create/Modify/Query/Destroy Queue Pair - Create/Destroy Completion Queue - Create/Destroy Memory Region - Create/Destroy Address Handle - Allocate/Deallocate Protection Domain - Create/Destroy Event Queue - Get Statistics - Get feature - Set feature - Query device Refer to efa_admin_cmds_defs.h for the list of supported get/set feature properties. The Asynchronous Event Notification Queue (AENQ) is a unidirectional queue used by the EFA device to send to the driver events that cannot be reported using ACQ. AENQ events are subdivided into groups. Each group may have multiple syndromes, as shown below: The events are: Group Syndrome Keep-Alive - X - ACQ and AENQ share the same MSI-X vector. Interrupt Modes =============== Management interrupt registration is performed when the Linux kernel probes the adapter, and it is un-registered when the adapter is removed. The management interrupt is named: efa-mgmnt@pci:<PCI domain:bus:slot.function> Data Path Interface =================== I/O operations are based on Queue Pairs (QPs) - Send Queues (SQs) and Receive Queues (RQs). Each queue has a Completion Queue (CQ) associated with it. The QPs and CQs are implemented as Work/Completion Queue Elements (WQEs/CQEs) rings in contiguous physical memory. The EFA supports Low Latency Queue (LLQ) mode for SQs: In this mode the userspace provider writes the WQEs directly to the EFA device memory space, while the packet data resides in the host's memory. The device uses a dedicated PCI device memory BAR, which is mapped with write-combine capability. The RQs reside in the host's memory. The EFA device fetches the EFA RX WQEs directly from host memory. The user notifies the EFA device of new WQEs by writing to a dedicated PCI device memory BAR referred as Doorbells BAR which can be mapped to the userspace provider.