InfiniBand for Dummies

Getting Started Tutorial for InfiniBand and Linux

Created by: spiderr, Last modification: 03 Jan 2018 (08:34 UTC)
I had no idea Ethernet had a competitor until 2017.

These are mostly my personal notes, culled from various holes on the Internet. I wish there was a basic "Noob's Guide to Networking beyond 1000BaseT (i.e. ye good ol' 1 Gigabit RJ-45 Jack.)

Some of this stuff may not be accurate.. updates appreciated.

Hardware

Mellanox is the king of InfiniBand, though they are selling more Ethernet equipment than InfiniBand these days.

  • SR-IOV is hardware based virtualization used to allow one device appear as many to your base operating system (HyperVisor) allowing you to hard config physical hardware to your virtualized server. Most servers built after 2009 (PowerEdge Gen 11, ie R610 etc) should have support for it, though you have to enable it in the BIOS.
  • Gotcha ConnectX-2 Cards suck because they do not support SR-IOV (which is required for KVM Linux Virtualization, see below)
  • ConnectX-3 Cards are great! They can auto detect InfiniBand or Ethernet (this is called VPI).
  • Gotcha ConnectX-3 MCX354A Dual port cards cannot be individually assigned SR-IOV port assignment. You have one port for an InfiniBand network, and another for your Ethernet network. The num_vfs has to be equal if you are running your card in ib + eth Unless you have linux kernel 4.1 or higher (see comments). Also, Port 1 = eth, Port 2 = ib is unsupported according to the PDF manual, and must be 1=ib, 2=eth.
  • Lowend servers / chipsets (Think Dell R4X0 and below) cannot do SR-IOV may not work properly with the Mellanox cards and you will get this annoying error "vfio: error, group 1 is not viable"

Switches

You will need an InfiniBand switch. There are two flavors of Mellanox switches on eBay - the super cheap, and the kinda pricey. The pricey ones are Mellanox branded, have firmware updates and support. The others are OEM'ed.
  • The EMC-OS **managed** switch is cheap, but sucks unless you have EMC hardware lying about (which you don't cause you're reading this.)
  • The super cheap (like under $200 eBay cheap) EMC branded SX6005 **unmanaged** switches are awesome and "just work" for basic InfiniBand network.

Software

  • The Linux kernel has built in InfiniBand support. On CentOS, you can do this: yum groupinstall "Infiniband Support"
  • If you wan to use the Mellanox support (called OFED), you will very likely want to install it with "mlnxofedinstall --add-kernel-support"
  • Gotcha Linux KVM virtualization only supports Ethernet bridging, thus you MUST use SR-IOV if you want InfiniBand in your Guest servers.