Tuesday, February 28, 2012

Infiniband - You Gotta Have Pretty Good Game

A long time ago I saw a t-shirt from an Infiniband company that said the following

"Infinband - Fast, Cheap, Easy"

my thought process when first reading this was:

  • Fast - Yup, it's fast.  Definitely faster than ethernet.
  • Cheap - Yeah, way cheaper than ethernet.
  • Easy - Ummm, no, you gotta have pretty good game ... and a wingman would help.

Now, I'm sure there are some market analysts at Cisco, Intel, IDC, etc. that did some fancy market analysis to figure out why Infiniband did not grow at the rates people predicted.  My guess has always been that it was because Infiniband isn't easy.  It's just way too different than ethernet, leading many institutions to not bother with it because it wasn't worth the hassle of learning, retraining, coming up to speed, etc.

I sometimes like to think of the issue with an Infiniband expert trying to explain Infiniband to a knowledgeable ethernet user. 

Ethernet Guy: So I installed all the hardware, loaded the drivers, but nothing is working.
Infiniband Expert:  Did you run the subnet manager? The subnet manager sets up and routes the fabric.
Ethernet Guy: Is that an option on the switch?
Infiniband Expert: Maybe, it's a piece of software that may run on the switch or a server. 
Ethernet Guy: Where is it on my fabric?
Infiniband Guy: On your network, it's a daemon running on a server.
Ethernet Guy: Ugh, but server configuration is handled by a different group.

or

Ethernet Guy: What are GUIDs in Infiniband?
Infiniband Expert: The GUIDs in Infiniband are like MAC addresses.  They are NIC specific identifiers.
Ethernet Guy: Ok, then what's a LID.
Infinitude Expert: A LID is sort of like an IP address.  It's the software based identifier for a port.
Ethernet Guy: So why does my Infiniband NIC have a LID and an IP address?
Infiniband Expert: You get the IP address from IP over IB.  It's a separate driver.
Ethernet Guy: So I need to load 2 drivers for one NIC?
Infiniband Expert: Yup
Ethernet Guy: So how do I see the LID for my NIC.
Infiniband Expert: You can use one of many tools, like ibstat.
Ethernet Guy: Why isn't it in ifconfig?

I could go on and on, but the point is it's so different that the learning curve is quite steep.  There's new ways to debug problems, new ways to route, new advanced configuration, new tools to learn, etc.  For the majority of institutions, the performance gain of Infiniband must be immensely superior to justify the cost for retraining, transition, inefficiency, maintenance, etc.

How many institutions found Infiniband to be "immensely superior" for their needs?  It seems to be not many.  I can imagine this conversation happening in many companies:

Manager: Hey, Engineer can you take a look at this Infiniband thing.   The sales people say it's super fast for the price.
Engineer: Sure thing, I'll play with it.
<1 week later>
Manager: So how is Infiniband?
Engineer: I can't figure any of this out.

and that's the end of Infiniband at that company.