« Here's to seeing YOU at VMworld 2008 | Main | UPDATED - 9/13/2009 - Some interesting VDI performance findings and updated Use-Case Solution Guides »

September 10, 2008

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Nick Triantos

Speaking of mudslinging and a disingenuous show of NetApp "affection" maybe I ought to turn your attention to your following statement.

"WAFL's tradeoff is free snapshots in exchange for non-linear performance under normal workloads and normal utilization. It's your superpower, and your kyptonite. Kickass write perfomance when there are loads of free spaces to write to..."

Now, where have I seen comments like these coming from before and I more importantly where have I seen comments like these crumble like a house of cards under the longest running SPC1 benchmark published in the industry. Mind you this was published in January 2008 yet 8mos later you're attempting to beat the same drum.

Now that's mudslinging.

Cheers

Chad Sakac

Nick, watch this - let's play a game, one I sincerely hope is a useful one.

I'm going to play the devils advocate, and then you do the same.

If you don't I'll play both sides.

I just don't compute how any technology (any) has only good design tradeoffs and not negative ones.

Ok, me pretending to work for NetApp (but I'm being sincere here, and look forward to you responding in kind):

1) WAFL is good, because it eliminate write penalties unde most circumstances
2) WAFL is good because the NVRAM/write model can deliver excellent peformance, and even better with RAM-based accelerator cards
3) WAFL is good, because it enables every write as NVRAM destages during a journal event to in effect be a snapshot - there is not "snapshots per se", because nearly everything is a snapshot. This enables WAFL-based systems to have excellent snapshot scaling and performance - a core feature on which many other advanced features are based
4) WAFL and the fundamental design premise of using a filesystem as the underlying storage "container" means that common underlying mechanisms that apply at or below the filesystem can be applied to higher-level functions (i.e. a vehicle for delivering iSCSI and FC while maintaining common management models and functional capabilities), which has simplicity benefits.

All off the above are ABSOLUTELY true, and excellent design points for the early innovation of Netapp's founders.

Ok - now be intellectually honest - argue the other side - engineer to engineer.

I've got to do some more work tonight, but tomorrow, if you haven't, I'll do it. Others are welcome of course (I don't edit comments), but EMC'ers and other competitors, don't pile on - I don't think it would help.

Netapp folks, NetApp customers, you're welcome to add things that you think are design advantages, particularly if you think that they have no negative downside.

BTW - my intent here isn't to refute the advantages and somehow say that they don't have benefits, but instead refute the position that any given design decision or implementation isn't inherently a tradeoff.

Nick Triantos

Chad, we don't need to play intelectual games in order to reach a conclusion to a long running saga regarding perceived and unproven WAFL deficiencies or inefficiencies, if you will.

The verdict has been in for some time now and so has the supporting the data at SPC1.org and can be viewed by any and all interested parties, including yourself since you don't seem to have studied it at all. In fact, as a Netapp user that you are, you ought to take a look. It may change your perspective and your assumptions.

Terry

Nick, I guess NetApp invented NFS as well? Why don't you guys build a solution that can scale beyond two heads, build a native fibre channel solution (maybe your numbers will go up), stop using these silly benchmarks which has been discredited for years and to me is more of a marketing exercise then anything else. and settle your suit against Sun. Which by the way invented NFS;).

Stop NetApp your killing me. (do a google on that).

I can't speak for Chad, but when my filer was in production we did see good performance but then it hit dirt in a few months. Who has time for weekend defrags.

Nick Triantos

Hi Terry,

A solution that can scale beyond two heads?

http://findarticles.com/p/articles/mi_m0EIN/is_2006_June_12/ai_n16463837

What Does Native Fibre Channel mean? There's only one way to implement the Fibre Channel protocol Terry.

The "silly" benchmarks have been discredited mainly by those who dont have courage to run them...


Chad Sakac

OK, first, three apologies:

Apology 1: Nick, you're right that my comment (the "Superpower/Kyponite" comment) was hyperbolic and inflamatory. I'm not being disingenuous I just stepped over a line I try not to. Stated differently, in a less inflammatory way - my point was that for everything (technologies, companies, people) - our strengths are intrinsically our weaknesses. I always get rankled by the "there is no downside to this engineering/technology decision" position.

Apology 2: for this subsequent list itself. I have to complete it - it's a matter of following through on a commit, and I try to predictably do deliver against commits. I also hate that I can't figure out how to be more concise :-)

Apology 3: Terry, Nick can't see your email the same way I can, and I'm sorry. Terry isn't an EMC person Nick - I think he is a customer. How about "I'm sorry you had a negative experience, I want to know more to learn how we can do better" (I say this often, but far less often than I say "thank you for being an EMC and VMware cusotmer").


I've said it before, trust vendors that tell you where the solution stops working (saying this generally, not about either Nick's company or mine). Trust benchmarks that show where the solution broke. Trust partners who tell you what NOT to do with their own gear. Distrust those who tell you who focus on what is bad about others, rather than what they can do to help you.

Cringe... I'm breaking my self-commandment above, but to follow through with my commit, here we go.

Know that I will try always to openly state what we find as limits of our own stuff. If there are factual errors in my commentary, folks, corrections are welcome. **I DO NOT CLAIM TO BE A NETAPP EXPERT**. Can I ask to try to keep it on the level (i.e. technical corrections welcome, ravings not so welcome), I'm trying to seal the wound, not reopen it. My point was purely to point out that everything has an up and down,


1) "WAFL is good, because it eliminate write penalties unde most circumstances"

The downside of this design is twofold. First that the strength (reforming random writes into contiguous writes buffered by NVRAM) makes the link between NVRAM and scale of features and performance intrinsic. The second is that sequential reads after random writes as the WAFL layer lacks contiguous blocks can cause non-linearity in cases that are expected to be linear (for example some, databases expect locality of reference, and use this as an optimization technique - so in some cases WAFL helps, in others it hinders - the essence of a tradeoff). That is not to say that I'm saying that Netapp filers catch fire and explode as WAFL has less contiguous space, or this is a bad design choice, rather - they are TRADEOFFs. There is a reason why WAFL Iron and other utilities exist. Again - not bad, just a tradeoff.

2) "WAFL is good because the NVRAM/write model can deliver excellent peformance, and even better with RAM-based accelerator cards"

the role of NVRAM is analagous in some ways, but very different in others, than a write cache. The core function is not as a buffer (though it absolutely does buffer writes), but as an instrinc mechanism to ensure filesystem consistency in the journaling action. The downside of this is that the NVRAM cache size is a core limiting factor for many NetApp features and envelopes. System memory is also an important factor (like it is on Celerra and CLARiiON and all arrays, and why we're all rearchitecting for massively multicore 64-bit procs and large addressable RAM) for other features - like some of the dialog if you follow the search the Terry's previous comment suggested. That is not to say that I'm saying that Netapp filers catch fire and explode, or this is a bad design choice, rather - they are TRADEOFFs.

3) "WAFL is good, because it enables every write as NVRAM destages during a journal event to in effect be a snapshot - there is not "snapshots per se", because nearly everything is a snapshot. This enables WAFL-based systems to have excellent snapshot scaling and performance - a core feature on which many other advanced features are based"

The downside here is the requirement for background reclamation, which NetApp has done great work to make more and more transparent, but must be done. In a prior company before EMC acquired us (Allocity) we used a WAFL-like block layout mechanism with a B+ tree pointer table, and it was REALLY hard to reallocate the blocks and restructure the B+ tree. Netapp is clearly better than Allocity at this :-) But, we had plenty of smart folks, and it was HARD. This is one of those things that is intrinsic - you pay the piper before or after (move the blocks at some point or another). The other core issue is that while this core architecture is excellent at snapshots, NetApp's clone capabilities are generally viewed as inferior to other vendors. The response (SyncMirror - which is also they way they do a RAID 1-type thing that is better characterized as a mirror of a RAID-DP/4 container) when a workload or customer - right or wrong - demands it, is missing some features that most customers expect in that use case (consistency groups across objects spanning containers), because those are very hard to do if the files are in different filesystems. Though consistency is instrinsic if they are in the same filesystem - this can conflict with other best practices. The competitive response ("Clones/BCVs are always bad, snapshots are always right") would be the same if EMC said "Snapshots always are bad" (and sadly, sometimes we do) is a bad response either way. So, once again - strength is intrinsically a weakness - for both approaches. Still, in geniune outreach - kudos for NetApp for driving the use cases of snapshots into the mass mainstream. Replicas of data are good for lots of reasons - period. The EMC view is sometimes you want snapshots, sometimes you want clones. Calling a writeable snapshot a "FlexClone" (emphasis on Clone - which was widely used as a word describing a ) was genius marketing. I'm actually a marketing idiot. If it's not apparent, I can't say anything in a short, clear way :-) I would have likely called it a FlexWriteableSnapshot or something horrific like that.


4) WAFL and the fundamental design premise of using a filesystem as the underlying storage "container" means that common underlying mechanisms that apply at or below the filesystem can be applied to higher-level functions (i.e. a vehicle for delivering iSCSI and FC while maintaining common management models and functional capabilities), which has simplicity benefits.

The downside here is twofold.

The first is that the block object, even if you comply EXACLTY with the FC and iSCSI protocol standards, which NetApp of course does - they have many smart engineers there - inherit the behavior of the underlying filesystem. Many of the filer limits are a function of how filer failover behavior will occur as Flexvol count, and capacity increase. EMC's Celerra is similar (datamover failover speed is a function of size and number of filesystems). It is a very hard engineering problem. BTW, this is why we enforce the Celerra "usuable capacity limits" as lower than what the backend can actually support (which of course is used in a compettive context). Likewise, the stated raw capacity behind a NetApp filer is not a reflection of the usuable capacity per se, but rather a upper maximum. To ensure sets of parameters (like the A-SIS notes if you do the google search the previous comment suggests) are the functional limits. And remember - LIMITS ARE NOT BAD - SO LONG AS THE VENDOR DISCLOSES THEM TO YOU - and Netapp states them in their docs, so make sure you see them, just like you should see EMC's). The challenge of accelerating filer failover and increasing the predictability is a long term project, and one where NetApp has made great strides and I would fully expect them to continue to do so. NetApp has also made great strides in working with Application and OS partners to build into applications the ability to extend the IO timeouts to sustain failover with a pause but not a hard I/o Failure. Customers should listen to the application vendors and Netapp and follow their best practices. Likewise, you will find these as common best practices with the Celerra. BUT conversely, not using a filesystem as the core container, with iSCSI and FC LUNs as files in the filesystem means that "pure" block devices have failover characteristics are measured in milliseconds. For example, this is a really, really hard engineering problem on non-open systems. Those folks dictate the requiremetns and expect the stoage subsystem to comply, and laugh if you talk about extending timeouts, and is one of the reasons why getting into the ultra high-end, not because of peformance, or drive counts, is hard for NetApp. Let me restate: our Celerra iSCSI implementation has the same design trade off (i.e. it's an iSCSI target that is a file in an underlying filesystem container) - but when we made our Celerra have FC, we (EMC) decided to trade off a single user interface and local/remote replication model in exchange for the "Native" FC characteristcs (i.e. the choice of FC isn't one purely of performance, but of other characteristics too. That's what Terry (I think) was referring to. It wasnt't that we couldn't. And it wasn't that Netapp made the wrong choice. We simply made a different one. For example, here's the downside of our choice - we now need to invest in a higher-level management construct to make our management model more integrated though the underlying implementations are different (for the reasons stated). Each customer needs to look at the benefits and decide what's right for them.

The second IS that scaling up is harder. Yes, NetApp acquired Spinnaker, and yes, they are shipping ONTAP GX. The duration of the integration effort shows just how hard the engineering problem is. Delivering all the features and benefits of ONTAP Classic and WAFL (i.e. the 7G family) while merging with the Spinnaker model is very hard. I don't underestimate Netapp's ingenuity, and I'm sure that they are doing it, and eventually will do it. I can only imagine the difficult of the decision to merge or maintain seperate the different codepaths and philopsphies. Whereas EMC has been that way since day one (for better AND worse), it must be anathema to some NetApp folks What customers need that sort of scale? It's a narrow use case, but an important one. NetApp, EMC and others have scale-up designs, designed for different use cases (ONTAP GX clearly focused at the very high end NFS single namespace case, EMC's DMX clearly focused at the linear performance, even in degraded hardware cases and ultra-high system availablity and scale). The market is ultimately the judge of validity of our choices.

Closing thought - this has taken a lot of energy, me to write, and you to read. I think that the stuff above text is a TOTAL NET ZERO addition to the value of the knowledge on the internet. It's useful for two engineers sitting down in a bar mulling this stuff over a pint. I don't think customers care, except insofar as the tradeoffs we all make express themselves to them in their use cases.

One thing useful for me.... this dialog has made **me** learn something.

When the SPC-1 stuff first happened, I was one of the voices crying out internally to respond in kind. And yes, I have read it, and looked in detail. I was outvoted and we didn't respond via the SPC. It wasn't a matter of fear, or a lack of courage (hyperbolic/inflammatory?) or a being a "marketing company, not a technology company" (hyperbolic/inflammatory?). It restarted a broad dialog about benchmarking inside the company. We continue to participate, and will continue to participate in public benchmarks where we see an even hand. Microsoft's ESRP (where we both have postings), and SpecNFS (where we both have postings) are examples. I don't know if these choice are right or wrong, but man - if this dialog is anything, there's something to be said for the agrument that was lobbed against me by EMCers in the SPC debate: "Chad, you don't know how much work it will be just to fight the competitive benchmarks, and it's work that does little for anyone, and just spirals into Mutually Assured Destruction logic". Did we do some stuff that was bad in response? Sure - competitive teams everywhere have that as a job (man, I would hate that). That was them, not me.

Re "showing performance data even where it fails", I've posted examples from hundreds of examples availble to EMC, EMC partners, and customers in earlier posts. I'll be showing more at VMworld. I'm going to try to stay above the fray, and post as much useful knowledge as I can.

All this back and forth has made me learn: I crossed a line I don't want to cross. I want to spend my energies and this blog focused on what we're doing to help customers, not go back and forth. I'm sure I'll occassionally fall off the wagon, but there it is. I'm going to try.


Mike Shea

Glad you hit being able to create 10K clients.

Means nothing. Absolutely nothing. Nada. Zilch. Zero. Zip.

Anyone can do it - as long as you are willing to dedicate the number of hosts required to do it in your *lab*.

It is a smoke screen - nothing more. Living with what you have decided to buy - that is the big thing, and NetApp does it simpler, with far fewer storage objects to manage. Virtualization is in part about moving from many to few. Anything else is silly, expensive and ultimately, not satisfying.

That is why EMC's biggest customers are moving to NetApp architectures. Ask us.


Period.

Jonas Irwin

Hey Mike and Nick,

I would love to see less emotion from you ntap folks and more solid discourse here. As you guys may know, I have a unique perspective having worked for both NetApp and EMC (now at EMC).

Chad has extended what I believe to be fair, honest and very well substantiated points around the general topic of engineering and design trade offs.. If you took the time to read it, you should have seen that he included challenges inherent to file system based architectures and mentioned BOTH the EMC Celerra and NetApp NAS Filer. Can you guys honestly say Netapp has no design trade offs? Why does it seem that all the posts that come across from you guys can easily be summed up as:

"EMC is really, really BAD and NETAPP is really GOOD. Just look at the SPC-1 results we ran for EMC".

A little perspective:

At EMC, I work with roughly 100 customers a quarter and shockingly, not one of them runs SPC-1 as a line-of-business app. Bottom line: customers simply do not care about marketing papers, all they care about is how the frame will work in their environment, with their own applications.

They also wonder why Netapp is constantly comparing itself to EMC . It reminds me of the Kia and Hyundai commercials where they say they have better features than Accord or Toyota Camary for a lower price. If EMC has such poor storage systems, why is Netapp always comparing themselves to us?

A final word on performance:
In attempt to help customers meet their needs in the field, I have headed up 5 large scale, real world benchmarks with real customer data and applications at EMC against Netapp since I've been here. The results: EMC 5, NTAP 0. If EMC performance is so bad, how is this possible? Here is a press release of one customer who was so happy with EMC after benchmarking their application with both emc and netapp, that they wanted to jointly announce it to the Street: http://www.emc.com/about/news/press/2008/20080529.htm

-Jonas Irwin

goktugy

Hi,
Thanks for your great blog. It is fun to read although it takes sometime :)
I have a spefic focus on your blog especially with EMC datamover failover part. You say it is hard to guess a failover time by function of time and filesystem count.
I have a NS40 which has 8 filesystem and 3TB total size with %90 usage. DART is the final one of 5.6, fresh updated.
150 vmware machines are running over iSCSI and performance is quite good.
However, datamover failover to standy takes 600 seconds. When we follow with getreason standby waits for active datamover rebooting and then activates immediately. I think 600 seconds is not the failover time but the time of active reboot process.
Any comment would be helpful. We are think to move Netapp indeed.

Thanks in advance.

Chad Sakac

Thanks goktugy, and thank you for being an EMC customer. Obviously want to make sure you stay that way, and stay happy :-)

There was an issue with the earlier (5.6.3x) DART builds where failover was significantly longer than expected under some relatively rare conditions, but it looks like it may be affecting you.

Note that a lot of the best practices guides note to extend timeouts to 600 seconds (10 minutes) for various OSes - but that is absolutely NOT the target failover time, rather a worst case. In general, with the configuration you describe, failover should be occuring in less than 2 minutes, and in possibly as low as 30 seconds (again, as I noted, with all filesystem-based devices, failover is very difficult to bound).

Can you tell me which DART version you are using ("server_version server_2" at the CLI)

goktugy

Thanks for joining me in. I 'd like to be happy as an EMC customer infact, but I wonder how to achieve that with in the specified failover times :)

I know that this is not a customer blog nor a support/complaining place. However I believe a customer eye might be useful as a relevance. So let me put my mind in please :)

I have 150 VMware machines and going to be more. If failover takes 2 minutes most of my virtual machines are dead which means a disaster for me :(
I checked VMware timeouts and its soft iSCSI initiator (Cisco) timeout is something like forever. Should I check also check virtualized Windows and Redhat timeouts as well?

EMC and Netapp has great features/performance, but if both has a potential of a disaster occurance none of the customers I know wouldn't invest on it. I wouldn't if I knew...

Back to the subject here is my DART version ;)
[nasadmin@emcnaccs42 ~]$ server_version server_2
server_2 : Product: EMC Celerra File Server Version: T5.6.40.3

Also I need to correct myself, it is not 600 seconds but 300 seconds (5 minutes) as my collegues informed.

goktugy

Hi again,

EMC solved the case by setting EnableAptpl=1. This setting reduced the failover/failback time 300secs to 60 secs. 150 iSCSI virtual machines pauses I/O and continues normally.

Thanks for your support.

Chad Sakac

goktugy - our pleasure. FYI for anyone else reading this thread - the Datamover parameter Goktugy mentioned here affected DART revisions NAS 5.6.39, 5.6.40, or 5.6.41 (all of which are behind the current rev now), during which the datamover failover for datamovers with active iSCSI targets regressed.

An alternate to applying the workaround is to do a DART upgrade to 5.6.42 or later, but the datamover parameter is a relatively easy workaround.

It's only downside is the application of the paramater requires a datamover failover.

Again - thank you for being an EMC and VMware customer!

Anonymous

A blog update from the future (2009)?! Please tell us if the economy is going to recover by September...

Chad Sakac

LOL - anon!

Yes, I've got good news for you. the economy comes to a fantastic rebound in mid-year, well ahead of September. By the end of the year, economists are happily confused (i.e. the recovery is so good), that they go back and do detailed analysis. The turnaround turns out to be based on huge productivity improvements, power savings, CapEx and Opex reduction starting mid-year with huge innovative products from VMware, Cisco and EMC!

In all seriousness - putting aside the plug - I really, really hope it gets better. Personally, I think we're at the bottom, and it will improve - but I tend to be an optimist.

(On another note - I've got a wierd date issue in my head - I struggle with my basic daily agenda - sometimes if it weren't for my blackberry, I wouldn't know what DAY it is :-)

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.