Semantics of Remote Direct Memory Access

Abstract

Remote direct memory access (RDMA) is a modern technology enabling networked machines to exchange information without involving the operating system of either side, and thus significantly speeding up data transfer in computer clusters. While RDMA is extensively used in practice and studied in various research papers, a formal underlying model specifying the allowed behaviours of concurrent RDMA programs running in modern multi-core architectures is still missing. This paper aims to close this gap and provide semantic foundations of RDMA on x86-TSO machines. We propose three equivalent formal models, two operational models in different levels of abstraction and one declarative model, and prove that the three characterisations are equivalent. To gain confidence in the proposed semantics, the more concrete operational model has been reviewed by NVIDIA experts, a major vendor of RDMA systems, and we have empirically validated the declarative formalisation on various subtle litmus tests by extensive testing. We believe that this work is a necessary initial step for formally addressing RDMA-based systems by proposing language-level models, verifying their mapping to hardware, and developing reasoning techniques for concurrent RDMA programs.

Paper

Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures, OOPSLA 2024
Guillaume Ambal, Brijesh Dongol, Haggai Eran, Vasileios Klimis, Ori Lahav and Azalea Raad
[Paper] [Extended Version]

Artifact

Guillaume Ambal, Brijesh Dongol, Haggai Eran, Vasileios Klimis, Ori Lahav and Azalea Raad
[RDMA Litmus Tests] [Coq Development]

People

Guillaume Ambal (Imperial College London)
Brijesh Dongol (University of Surrey)
Haggai Eran (NVIDIA)
Vasileios Klimis (Queen Mary University of London)
Ori Lahav (Tel Aviv University)
Azalea Raad (Imperial College London)

Related Projects

Sufficient Conditions for Robustness of RDMA Programs