Last month, the SNIA Networking Storage Forum hosted several experts leading the Open Programmable Infrastructure (OPI) project with a live webcast, “An Introduction to the OPI (Open Programmable Infrastructure) Project.” The project has been created to address a new class of cloud and datacenter infrastructure component. This new infrastructure element, often referred to as Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or xPU as a general term, takes the form of a server hosted PCIe add-in card or on-board chip(s), containing one or more ASIC’s or FPGA’s, usually anchored around a single powerful SoC device.
Our OPI experts provided an introduction to the OPI Project and then explained lifecycle provisioning, API, use cases, proof of concept and developer platform. If you missed the live presentation, you can watch it on demand and download a PDF of the slides at the SNIA Educational Library. The attendees at the live session asked several interesting questions. Here are answers to them from our presenters.
Q. Are there any plans for OPI to use GraphQL for API definitions since GraphQL has a good development environment, better security, and a well-defined, typed, schema approach?
A. GraphQL is a good choice for frontend/backend services with many benefits as stated in the question. These benefits are particularly compelling for data fetching. For OPI for communications between different microservices we still see gRPC as a better choice. gRPC has a strong ecosystem in cloud and K8S systems with fast execution, strong typing, and polygot endpoints. We see gRPC as the best choice for most OPI APIs due to the strong containerized approach and ease building schemas with Protocol Buffers. We do keep alternatives like GraphQL in mind for specific cases.
Q. Will OPI add APIs for less common use cases like hypervisor offload, application verification, video streaming, storage virtualization, time synchronization, etc.?
A. OPI will continue to add APIs for various use cases including less common ones. The initial focus of the APIs is to address the major areas of networking, storage, security and then expand to address other cases. The API discussions today are already expanding to consider the virtualization (containers, virtual machines, etc.) as a key area to address.
Q. Do you communicate with CXL™ Consortium too?
A. While we have not communicated with the Compute Express Link (CXL) Consortium formally. There have been a few conversations with CXL interested parties. We will need to engage in discussions with CXL Consortium like we have with SNIA, DASH, and others.
Q. Can you elaborate on the purpose of APIs for AI/ML?
A. The DPU solutions contain accelerators and capabilities that can be leveraged by AI/ML type solutions, and we will need to consider what APIs need to be exposed to take advantage of these capabilities. OPI believes there is a set of data movement and co-processor APIs to support DPU incorporation into AI/ML solutions. In keeping with its core mission, OPI is not going to attempt to redefine the existing core AI/ML APIs. We may look at how to incorporate those into DPUs directly as well.
Q. Have you considered creating a TEE (Trusted Execution Environment) oriented API?
A. This is something that has been considered and is a possibility in the future. There are some different sides to this:
1) OPI itself using TEE on the DPU. This may be interesting, although we’d need a compelling use case.
2) Enabling OPI users to utilize the TEE via a vendor neutral interface. This will likely be interesting, but potentially challenging for DPUs as OPI is considering them. We are currently focused on enabling applications running in containers on DPUs and securing containers via TEE is currently a research area in the industry. For example, there is this project at the “sandbox” maturity level: https://www.cncf.io/projects/confidential-containers/
Q. Will OPI support integration with OCP Caliptra project for ensuring silicon level hardware authentication during boot? Reference: https://siliconangle.com/2022/10/18/open-compute-project-announces-caliptra-new-standard-hardware-root-trust/
A. OPI hasn’t looked at Caliptra yet. As Caliptra matures OPI will follow the industry ecosystem wider direction in this area. We currently follow https://www.dmtf.org/standards/spdm for attestation plus IEEE 802.1AR – Secure Device Identity and https://www.rfc-editor.org/rfc/pdfrfc/rfc8572.txt.pdf for secure device zero touch provisioning and onboarding.
Q. When testing NVIDIA DPUs on some server models, the temperature of the DPU was often high because of lack of server cooling resulting in the DPU shutting itself down. First question, is there an open API to read sensors from DPU card itself? Second question, what happens when DPU shuts down, then cools, and comes back to life again? Will the server be notified as per standards and DPU will be usable again?
A. Qualified DPU servers from major manufacturers integrate close loop thermals to make sure that cooling is appropriate and temp readout is implemented. If a DPU is used in a non-supported server, you may see the challenges that you experienced with overheating and high temperatures causing DPU shutdowns. Since the server is still in charge of the chassis, PDUs, fans and others, it is the BMCs responsibility to take care of overall server cooling and temperature readouts. There are several different ways to measure temperature, like SMBUS, PLDM and others already widely used with standard NICs, GPUs and other devices. OPI is looking into which is the best specification to adopt for handling temperature readout, DPU reboot, and overall thermal management. OPI is not looking to define any new standards in this area.
If you are interested in learning more about DPUs/xPUs, SNIA has covered this topic extensively in the last year or so. You can find all the recent presentations at the SNIAVideo YouTube Channel.