Thursday, February 13, 2014

HDFS vs Lustre

There's been discussion out there about comparing the HDFS filesystem to a traditional parallel filesystem like Lustre.  The problem is it's really difficult to compare apples to apples.

As an example, I saw a white paper awhile back (sorry, I can't find it online) that compared HDFS to Lustre.  HDFS beat Lustre in this person's performance tests by a good margin.  After digging into the paper I saw why.  This fellow ran Lustre over a 1 GigE ethernet network. 

Is this a fair test?  On the one hand it isn't because Lustre is a network based filesystem.  If you simple choose to bottleneck Lustre, of course it will lose.   On the other hand, it's a fair test, because it uses the same hardware most use with HDFS.

So lets say we replaced the GigE with Infiniband.  Would it now be a fair test?  Perhaps its slightly fairer, but HDFS people can say HDFS wasn't designed for more expensive hardware and therefore doesn't take advantage of it.  In the case of Infiniband, HDFS isn't using RDMA during replication.

I don't know the right comparison.  However, HDFS vs Lustre may not be the correct comparison to think about.  At the end of the day, I could probably concoct an HDFS setup that will always beat a Lustre setup and vice versa.

I believe thinking about this as HDFS vs Lustre isn't the right approach.  It's really Hadoop Cluster vs HPC Cluster.  At the end of the day, while Hadoop is famous for handling large data, the reality is because of shuffle/sorting/scheduling/etc. in Hadoop, it also reads/writes tons of small files.  The memory for a Hadoop Cluster vs HPC cluster may also be different.  That affects spilling of data, page cache, etc.

Update: See "Big Data vs HPC" follow up.
Update: See "HPC vs Big Data" follow up.

Update 6/2/15:

Not so long ago I was talking to someone about the HDFS vs Lustre comparison.

Many people have done HDFS vs "Some Networked Filesystem" experiments.

However, I think these experiments are inherently flawed.  The experiments always look something like this.

8 nodes
4 SATA disks
8 core

Networked Storage
4 nodes
8 SATA disks
8 core each

with additional hardware details beyond this.

The comparison will be HDFS using the Datanodes for data & map reduce.  Then it'll be a comparison to the Networked Storage, also using the Datanodes as the computation facility.

Do you see the inherent problem in the above comparison?


It's staring you right in the eyes.


It's an 8 node test vs a 12 node test.

It's a 256G RAM test vs a 384G RAM test.

This isn't to say that the comparison is poorly done.  But this is part of the inherent problem of comparing HDFS vs Networked File systems.  What is a fair comparison?

1 comment:

  1. This is just the information I am finding everywhere. Thanks for your blog, I just subscribe your blog. This is a nice blog..