« A Few Technical Threads - Part 3: SRM Failback | Main | Here's to seeing YOU at VMworld 2008 »

August 29, 2008

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Duncan

I guess this goes for almost every benchmark done by a vendor. You can twist and turn figures and tests until the outcome is in your favor.

Chad sakac

Duncan, I wholeheartedly agree. I hate it when EMC does it.

NetApp has been on this bandwagon for the last year at least (using performance testing to show not that they are great, but that they are better). At the SPC benchmark they recently did which they tout (it's in the presentation they give at VMUGs - not saying "NetApp is great", but "NetApp is great, EMC sucks") - they crossed a line that we haven't crossed (at least to my knowledge). They brought a CX3 of their own, configured it, and then ran both it along with theirs, and submitted them together.

Think about that - a competitor submitting a benchmark of your product. Imagine how credible it would be if Microsoft submitted a benchmark where they ran Hyper-V and VMware Infrastructure against each other.

At EMC, we're not angels, we're as guilty as any vendor, and I try as hard as I can to not play that game.

Vaughan is an honorable guy, as most people are at NetApp, EMC, and most vendors.

My personal advice and rule of thumb? There are two:

1) Don't trust performance benchmarks unless they show where the wheels came off the train. Duncan, as a VMware employees, you have access to Powerlink (as all our partners and customers do). Search for "Validation Test Report" + an app + VMware - and you'll see that that's the modus operandi of those - test to fail, then say "do this, don't do that". We don't post those or try to twist the outcome for benchmarketing, because it's genuinely done in the spirit of "let's find out how this works in practical use cases and test the envelope in those use cases before our customers do".

2) Performance on any of these platforms from strong companies are GREAT for almost any general use case. Are there corner cases, high end scenarios where percentiles matter? Sure. That's not the general case. When a EMC platform fails to perform (and the same goes for Netapp), it's usually misconfiguration, or poor knowledge transfer to the customer who self-configures (just like it is with VMware, and most other things), or poor design by the vendor or the reseller.

My suggestion - don't listen when a vendor (including EMC) says "X outperforms Y", and worse "Y explodes when you do A, our product X doesn't".

- Listen to the people who listen to you.
- Listen to people that build a configuration based on your requirements, and that spend the time to do analysis, whether it's via Capacity Planner or other tools.
- Listen to people that tell you the limits of their own solutions, in frank terms.
- Listen to people that build a solution that delivers the features you need.

What I was trying to point out with TR-3697 (which of course I've seen used already in the competitive context) has a couple glaring (at least to me) considerations that (again, this is IMHO) strongly conflict with the noble goal stated as it's purpose.

I'm going to try to see if I can rally the resources to do a broad-based test suite, broad I/O sizes, mixes of I/O types, utilization points, and under normal feature set use (local, remote replicas, etc.) I would be totally game to sit down with some of my respected peers, and map out the test plan in advance - and make it cross-platform, cross company. Perhaps we could add this to VMMark (which right now doesn't do a lot for IO).

Duncan

VMMark would be a great way to have Storage Vendors compete for the top! Develop a couple of preset tests which can't be modified and let the Vendor do everything they possibly can to get the most out of their systems. Everything is allowed as long as it's documented. This way everyone benefits, and especially the customers.

Thanks for the response!!

Mike Shea

Hi Chad -

Please, have a look at he upper right hand of TR-3697. As you can see, this is the VMware logo. The only way a VMware logo gets on anyone collateral is if they read and approve. Same with an EMC or NetApp logo.

Enough said - It has the VMware stamp of approval.

Next?

Chad Sakac

Mike - thanks for the comment, thanks for reading, and I hope you're enjoying it at NetApp, I'm sorry you left EMC. I agree with your comment. That's why I said (look in my post):

"Let me summarize (both TR-3697, EMC's docs, and VMware's docs agree!):

NFS and iSCSI are fantastic and near FC in throughput and IOPs on small (4/8K) IO workloads (though consistently higher in CPU utilization - though MHz are close to free these days), but they diverge (this is not bad, it's expected) at large IO sizes, and large througput workloads - even with 2GBps FC (which is the chart above from VMware) - which no one even buys anymore. This is different with 10GbE and jumbo frames of course."

TR-3697 tested the first three columns in the VMware chart. We all agree - NFS, and iSCSI are great choices for customers, along with FC. My point is that protocol performance and config performance, and platform performance all vary in complex ways.

I'm not trying to trivialize, or obfuscate - the test matrix for this is a big piece of work.

If the point of the paper is that NFS and iSCSI are as legitimate as FC for VMware storage, and to prove that NFS doesn't have a "performance penalty" compared with iSCSI (and in some cases CPU utilization fractional benefits) - hallelujah, I'm with you, and EMC is with you.

The primary reason why NFS and iSCSI perform so similarly (contrary to "block is better" misconception) is somewar like Oracle 11g, VMware's NFS client lives in the vmkernel, not in the user space.

If the point of the paper is to indicate that iSCSI/NFS/FC perform within these envelopes of one another - I don't think the test suite was broad enough. We see the same data for our NFS server (the Celerra) for small block IO. I define small block IO as equal to or smaller than 8K - like an OLTP database workload, Exchange 2007 steady state (but not backup) workload, or an IOmeter workload ).

NFS has many strong intrinsic benefits in the VMware space (but so do the other protocols, every customer is unique).

I'm willing to joint test with NetApp (and others if you're interested!) if you guys want - but I do think we would need to broaden it out:
- different IO sizes (4K-512K) - this VMware did better than yours
- different workload mixes (as I pointed out - you guys did this better in the doc than Vmware did in theirs)
- and baseline + features testing (snapshots/replicas)
- and we would need to agree (this would likely be the sticking point) about utilization and test duration for a valid real-world use case.

This would be a lot of work (it would be a 5-dimension test just with the stuff above).

But, I'm game if you guys are!

Val Bercovici

Hi Chad,

Your colleague Chuck Hollis has had me "blog-u-pied" the past few days so I haven't been able to comment here :)

I appreciate your seemingly earnest attempt at a better class of competitive discussion, but your EMC-slanted overview of WAFL and classic FUD-oriented (I call it "imagineered") benchmark challenge betrays your bias which may be impossible for you to overcome.

FWIW - Mike already pointed out TR-3697 was jointly conducted with VMware, but since you insist on devaluing that with old WAFL "filesystem fragmentation" FUD in a FC-SAN context, let me share this public benchmark result with you. I'll be judging the true earnestness of your intentions by your response:

http://www.storageperformance.org/results/a00062_NetApp_FAS3040-48hr-sustain_executive-summary.pdf

Some quick NetApp highlights of this report:
- It covers a 4 week period during which the NetApp FAS array delivered perfectly consistent high performance
- Due to the limits of the SPC-1 auditing tools at the time, performance over only the final 48 of the full 672 hours was reported. As more vendors decide to publish results exceeding a standard 3 hour SPC-1 steady state, NetApp hopes to be able to publish results for the full 4 weeks
- Raw storage utilization was 75% (or 62% if you exclude snapshots to compare with lesser platforms that can’t deliver low latency with those enabled)
- Snapshots were continuously running with <3% measured impact over 4 weeks during the constantly-running intensive SPC-1 (random-write) workload
- Thin provisioning of LUN’s and snapshots was turned ON.
- Not that it matters for VMware, but this SPC-1 report was run using a FC-SAN networked storage configuration.
- More details here: http://partners.netapp.com/go/techontap/matl/spc1.html

EMC’s allergic reaction to SPC-1 is well documented, but for the purposes of your old WAFL FUD-inspired challenge it’s worth noting that:
- The SPC-1 workload is extremely intensive, including heavy cache-hostile mixes of random reads, writes and over-writes
- The highly respected SPC-1 workload was mutually designed and approved by a very wide spectrum of storage vendors (practically the whole industry, including EMC in the early stages) to ensure it didn’t favor any particular vendor
- All SPC-1 benchmark results are strictly and independently audited
- All published SPC-1 results are peer reviewed and can be retracted if a competitor identifies a flaw not discovered by the auditor.

So all in all there is no more transparent or respected storage workload out there in our industry! Using it, NetApp easily proved all of the folklore around “WAFL fragmentation” (in a filesystem or block context) and related performance over time or as the system fills up is just that - folklore. Actually it is carefully Imagineered folklore:
http://blogs.netapp.com/exposed/2007/12/benchmarking--1.html

It certainly is NOT fact.

I’ll try and see if I can squeeze in VMworld 2008 during all my busy activities that week, but for the record – you can order “Surf and Turf” for me accompanied by a fine Italian Brunello :)

Chad Sakac

Val, thanks for the comments - I guess I'm not a legitimate pro-EMC blogger (note - not anti-NetApp, and those are NOT synonymous!) without getting the Val attack dog treatment :-) I'm also glad to see that even though you've changed your gig there at NetApp, they can't declaw the "chief competitive guy".

Ok, on to the facts at hand - I never mentioned SPC-1, or WAFL fragmentation anything. What I did say was:

1) You can't just show 4 and 8K IO sizes and assume that behavior is linear at 16, 32, 64, 128 and 512K - that's all. It's just not a good scientific method. I say we do the test TOGETHER, if we can find common ground.

2) ALL systems have differing behavior over time, and with differing utilization. On non-WAFL systems, this behavior exists since you're short-stroking. On WAFL systems, there are other factors - factors that I don't claim expertise on (although I own Storevault and have owned NetApp filers). You also can't use one test to validate another - it's scientific method - one variable at a time. So, SPC results can't be compared with the IOmeter results here. I'm not trying to be a competitive dink about this, I'm an Electrical Engineer/Comp Sci double major - I have a hard time computing conclusions not supported by the data, and want to fill in the gaps. I'm curious.

3) I brought up the "features not used and features used" use case not to imply that you guys suck with snapshots - you don't, in fact it's a great strength of the NetApp design. The reason I brought it up is because you guys always claim (in the tests you guys run with our equipment like at the SPC, or commission a third party to do with our equipment) that we explode outside a narrow band of use cases. Come on. EMC and NetApp would not have their mutual loyal customer bases if we weren't both doing something right. I brought it up to say "hey, I'm sure you guys would want to do the 'with snaps/no snaps' use case, and I would be happy to do that".

Lastly - blogs like email, as opposed to face-to-face dialog, sometimes things don't come across the right way. I wasn't challenging Vaughan to show that BEFORE I would do dinner and drinks at VMworld, I was saying "ah screw all this, let's just have dinner and drinks no matter what".

Perhaps (and this would indeed be a sad thing) it's impossible for us all to not have a view that is even handed. Perhaps it's impossible not to be polluted by the perspective of where you work.

I, on the other hand - DON'T BELIEVE THAT. I think it CAN be done. I think it is possible to see other's point of view, and learn from each other, even from fierce enemies.

I have the resources at my disposal where we might even be able to do this. My lab in RTP is close to the Netapp Kilo Client lab, and my Santa Clara lab is a hop skip and a jump from Sunnyvale.

Imagine a joint VMware, NetApp/EMC whitepaper discussing all protcols, and common best practices, and not only the DOs, but the DON'Ts on each. Could it be possible? Could our respective orgs play nicely? The space-time continuum rupture might just destroy the universe - which means, dammit, we've got to try.

I'll keep away EMC's competitive team, and NetApp would have to agree to keep away theirs. We'd have to agree on some things, but it's not impossible.

Anyway, I'm happy to invite you to dinner and drinks along with Vaughan and I and we can talk about it

Val Bercovici

Sounds like you're channeling MLK's "I have a dream" speech here Chad :)

If you know anything about NetApp's culture, you know we're all about what you propose!

But I have to be realistic, I don't think EMC would ever formally approve this (we've tried & tried with V-Series on CX & DMX) as well as PowerPath support for FAS, and have been shot down by your execs at the 11th hour each and every time.

Chad Sakac

Thanks Val - I'm flattered by the comparison, and I AM perpetually hopeful about the intrinsic power of being positive (not in a funky new-age way, but rather a pragmatic optimtistic way).

I cringe (really, a physiological cringe) when the competitive ad hominem fur starts to fly. I really think it serves no one, and it certainly wasn't my intent - and I hope it never will be.

Let's close our (you and me) dialog here (I don't believe in closing comments, so any comments including yours are welcome) temporarily. and let me see what I can do. Hold my feet to the fire, and hold me accountable.

After the SPC-1 thing, there were multiple schools of thought on what we should do (as one would fully expect). Some won, and some lost. I lost - I thought we SHOULD be participating, but the arguments weighed were persuasive (though I still think I was right).

Let's see if I can win this one.

BUT - I'm not an idiot. For this to work, we would really have to make this an engineering activity, and competitive plays and marketing positioning would have to be left at home.

Personally, I'm confident enough in EMC, and our stuff to tolerate that, and I'm sure Netapp would be also.

John Spiers

Hi Chad,

In a session at VMworld 2007: http://www.vmworld.com/vmworld/static/sessions/2007/TA51.html

VMware compared the throughput and latency of FC, iSCSI and NFS using the following IOmeter workloads:

Workload: IOmeter standard set based on request size 1k, 4k, 8k, 16k, 32k, 64k, 72k, 128k, 256k, 512k. Access mode 50% read/write, access pattern 100% sequential, 1 worker, and 16 Outstanding I/Os.

The results were comparable to the results published by NetApp in: “NetApp Performance Report: Multiprotocol Performance Test of VMware® ESX 3.5 on NetApp Storage Systems.” In the VMware presentation, using an iSCSI software initiator, iSCSI had slightly lower latency and slightly higher throughput than NFS at all request sizes.

Regarding the NetApp and EMC performance debate, how do you guys explain the chart on page 6 in the following document: http://media.netapp.com/documents/tr-3521.pdf

This chart was generated by NetApp and it shows that their Filer performance degrades close to 50% when it’s 50% full, and it gets worse from there. I can’t tell from this blog thread whether they are admitting to the problem, denying the problem, covering up the problem, or have claimed to fix the problem.

John

Chad Sakac

Thanks John - great to hear from you.

Your first point is the same one I pointed out: VMware, EMC, NetApp are all saying the same thing, around NFS/iSCSI/FC, but the point I was making was that in their TR, they looked at only the small block (4/8K) scenarioes, and then made a broad brush conclusion. NFS is great for VMware, iSCSI is great for VMware, FC is great for VMware - customers should look to the right fit for them, and frankly figure how how they leverage all the other VMware goodness to get business value, back it up, manage it - and make their business more agile.

Re: filer performance as it fills, what I'd love is customers to comment on their experience. Do we have any readers that want to comment?

I'll tell you my experience in talking with NetApp customers:

- Generally, they are very happy.
- Generally, they love NetApp's features, and like the company.
- Generally, they acknowledge that your utilization varies based on reservations, with less reservations in NAS use cases, and more in block use cases. NetApp continues to innovate to automate and improve their capabilities to try to lessen this effect.
- Generally, they acknowledge performance non-linearity. This results in some good and some bad.
- Generally recently, they've been frustrated with more strong-arm sales processes designed to upgrade when they are happy with what they have, or extract additional software licenses.
- Sometimes the sales channel/SEs(but I bet this happens as much with EMC as NetApp - though I am hearing it more), been accused of arrogance, or being "holier than thou" (often VMware folks get the same feedback) by customers I talk to. That's a very dangerous path for any vendor in my opinion - EMC was the model of that in the early Symmetrix days and it almost killed the company when the dot com crash happened.

We picked up, adapted, and continued to grow and evolve, but it was dangerous. Pride goeth before the fall :-)

I REALLY don't like to comment, because vendors like you (Lefthand) and me (EMC) can't comment without incurring the wrath of the competitive brigade at NetApp and happy NetApp customers who think we're throwing them under the bus (I'm trying really not to).

As a NetApp customer as well as a competitor, the comments above have certainly been my experience.

*** ANY NETAPP CUSTOMERS WANT TO COMMENT? ***

BTW - most EMC customers are happy, most Lefthand (you) customers are happy too. We're all trying to do the best we can to serve our customers and be successful businesses.

Personally, I was happy to see the IDC data released today, we grew well in all categories, strong growth in NAS - we need to do better in iSCSI, EqualLogic was strong there.

All the details are here:

http://www.idc.com/getdoc.jsp;jsessionid=Q3IRWU1YQ20BQCQJAFICFGAKBEAUMIWD?containerId=prUS21411908

You can't grow the way we have without doing things right. We continued to outgrow all our larger competitors, even outgrowing those a 1/3 our size - and it gets harder and harder as you get bigger and bigger) without doing things right. The only category that grew faster than EMC was "Other" - which includes Lefthand - congratulations!

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.