Archive for the ‘Datacenters’ Category

Network Virtual Appliances Are Silly

Wednesday, June 9th, 2010

In recent years the possibility of supporting complex network functionality in the form of a “virtual appliance” has become reality.  The idea is to create software versions of traditionally “hardware-based” platforms, such as firewalls and load balancers (and their offspring Application Delivery Controllers), running as a VM atop whatever hypervisor.  I can see the appeal.  However, I’m convinced that the virtual appliance approach doesn’t work in the long-run.

For many of these appliances the conversion from hardware to virtual appliance isn’t hard; it’s not uncommon for network appliances to be based on off-the-shelf hardware with customized faceplates, so converting the software to a VM is straight-forward enough.  But therein is exposed a fundamental problem: in large part these off-the-shelf platforms don’t scale well.  Vendors have been able to take advantage of the latest off-the-shelf hardware for their appliances, which can perform at multi-gigabit speeds, because many enterprise applications don’t require anything near that capacity.  But any application that requires scale beyond the IO capacity a single server will be disappointed by a virtual appliance.

To illustrate the point, imagine a virtual appliance running as a VM capable of forwarding 1G of traffic but attempting to support a dozen VM servers each capable of generating the same 1G of traffic.  A dedicated, and possibly hardware-based, approach is obviously a better performing alternative.  Now imagine a Cloud service provider’s environment, where the datacenter traffic is the sum of all these applications densely packed into compute clusters.  Clearly a multi-tenant network appliance must be hardware-based to meet the aggregated demand.

Even if that demand is to be divided amongst many virtual appliances (i.e. a multi-instance approach rather than multi-tenant) we have a problem: topology.  Most often the network service needs to be in-line with traffic flows to work effectively.  For instance a firewall needs to see and control traffic flows, which is traditionally accomplished by being embedded within a network gateway.  For the most part, network topology is defined by the Cloud service provider and the customer has no control of it.  And even if the customer could control or manipulate topology, it is horribly inefficient to bounce traffic around inside the cloud on its way in or out, hurting performance.  This “amplification” of traffic is often inevitable inside a datacenter, but can be optimized by placing network services closer to the core switches and/or backbone routers.

In other words, network services are best provided by the provider of the underlying infrastructure.  Networks built to support these features will work better than trying to hack-together a service layer on top of an unintelligent Cloud network.

At least, this is true for now.  Look for my next post, which will discuss the future of network services around hypervisor- and network-embedded capabilities.

Tiers Good, Unreasonable Expectations Bad

Wednesday, May 12th, 2010

There is a new post on Data Center Knowledge regarding tiered pricing by Equinix. To quote:

Equinix (EQIX) said last week that it expects to implement tiered pricing that acknowledges the broader range of facilities and markets the company now serves.

This is a good idea because there is a diversity of needs for datacenter space across different geography at different levels of quality. Not every app needs to be supported by a tier 4 datacenter. But some apps do need to be local to a given geography. Offering a mix of environments across a wider footprint is a good move.

However I was disappointed to see a rationalization by Jarrett Appleby (normally a person that I think highly of) that potentially sets unreasonable expectations:

when it comes to reliability, he said not many changes are needed in the Switch and Data facilities. “We’re getting 5 nines (99.999 percent uptime) out of these sites,” Appleby said. “So from a performance standpoint, they’re already there.”

When it comes to something like reliability of complex systems, past performance is not an indicator of future performance. Systems break over time; components wear out. A newly built datacenter might operate at 100% availability but that doesn’t mean that you should expect consistent 100% uptime for the foreseeable future. Rather, it’s reasonable to set long-term performance expectations based on analysis of the components’ availability and redundancy. This is what the “tier” nomenclature refers to.

So, if Equinix wants to sell lower-tier datacenter facilities then good for them. Seriously, it’s a good idea. But customers should be aware of what they’re buying into, and not misled about the quality of a given facility. Your app may need to be hosted in multiple datacenters if the uptime requirements are greater than a single facility can provide. Marketing doesn’t change that fact.

New EMC Product Enables VM Mobility

Tuesday, May 11th, 2010

Yesterday I read an interesting blog post about a new product from EMC. The benefit of the product is that it enables very fast long-distance VM (virtual machine) mobility. But at the core, it’s a storage replication engine that focuses on “Distributed Cache Coherency”.

The VM mobility is impressive. To quote the blog post:

At EMC World today we vMotioned 100 VMs between clusters in 4 minutes and 500 in 20 minutes. That’s 2.4 seconds per VM. Across the equivalent of 100KM.

Clearly this wouldn’t be possible if the VM had to go through a typical “VMotion”, which involves (in simplified terms) pausing the VM, writing active memory to a file, moving the disk image and memory file to a remote server, and resuming the VM. The pausing and resuming aren’t time consuming but the movement of data across a network can be the source of significant delay. That’s where EMC does a little “cheating”. (or magic, depending on your point of view)

I don’t have first-hand knowledge of the platform. But I would guess that they’re keeping data mostly in-sync all the time, using background replication, so that the VM doesn’t need as much copied over the network at the point of migration. Even more importantly, they don’t even have to copy all of the data across the network before starting to use it. The platform copies a much smaller directory structure that points to the local drives for local data or the remote drives for data that hasn’t been copied yet. Upon accessing remote data it gets copied into local storage, which has a negative impact on performance. But the performance hit only affects that block at that moment; accessing any already-replicated blocks will run at native speeds.

Here is a diagram borrowed from the EMC posting to illustrate the idea:

vplex-DCC-GIF

As I said above, this is just my guess. The article goes into enough detail to get a rough picture of what happens. But I’m looking forward to hearing more details about this platform. I’m especially interested in hearing about conflict resolution. The article touches on this by saying:

The key is that VPLEX doesn’t solve the “what if two updates to the same block occur at the same moment?” problem (this is the core of the “did you solve the speed of light problem” question). The key is that MANY use cases ensure a single “writer” (single host writing to block at any given time), we don’t have to solve the “speed of light” problem (working on it :-) In the VMware use case, for example a single ESX host is the “writer” for a VM at any given moment, and a vMotion is an atomic operation – at one moment, one host is writing to a set of blocks, and then at next moment, another host is writing.

But the devil’s advocate in me still has to ask: but, what if?!? I’m sure I’ll have the chance to ask EMC someday soon. But regardless of their answer, I’m happy to see a platform like this one. EMC is uniquely positioned (given their relationship with / ownership of VMware etc) to make the assumptions and develop the tools that we’re going to need for any reasonable future in the cloud.

Reflections on MPLS 2009

Monday, December 14th, 2009

It has been well over a month since I attended the MPLS 2009 conference and participated in the panel on Emerging Technologies and Business Architectural Impact, and it is about time (over-due) that I posted my thoughts.

Foremost, I should say Thank You to Monique Morrow for organizing such a great panel and inviting me to participate.  I’ve known Monique as a colleague and friend for a while now, and whenever we have chance to meet she never fails to impress me.  In the context of this panel, I was dumbfounded at the quality and breadth of the other participants that she secured.  Monique moderated the discussion such that the panel’s large size was a benefit rather than liability.  As a panel we managed to cover multiple topics with decent depth, and were each allowed to illustrate our different perspectives.  I’ve been told by several audience members that it was an excellent panel, and from my (admittedly biased) perspective I must agree.

As for the discussion itself, I very much enjoyed participating.  Considering the quality of the other panelists, I am honored to have been included; each of the other panelists are recognizable for their contributions and role in the industry.  Given my respect for the other panelists, I tried to enter the conversation prepared and did not hold back any of my significant thoughts during the discussion… for better or worse.

Some of my comments may have included points that were controversial.  For instance, one theme that ran through the entire discussion was the complex balance of cost vs. capacity vs. features in network devices.  I challenged comments from Vijay Gill (of Google) and Donn Lee (of Facebook) which argued in favor of very-large dumb switches. ("dumb" is my word choice, but I suspect they would agree)  From their perspectives, as engineers for large web properties, they need to scale out single-tenant environments to support Internet-scale traffic loads and a simple L2 or L3 switch would enable their topology.  But, I argued, they were "weird" in their requirements, which are unique to large web properties.  Service providers and enterprise environments need more features in order to deal with the complexity and changing "customer" requirements they face daily.

After the panel I had the opportunity to chat with Vijay and Donn, and they had an interesting view of the cost / capacity / features debate.  Their comments deserve some focus, so look for a future post on this topic.

Another topic was the relevance of standards, which wasn’t particularly controversial but which caused some interesting comments.  My point was that standards are critical to the industry, but in the same way that fundamental research is critical to science and technology (broadly speaking).  We need to put effort into standards because it brings people together and promotes the state of the art.  But we also need to recognize that functioning interoperable implementations are what matter, regardless of the standards conformance, etc.  In other words, standards bodies should work diligently but not take themselves too seriously in the process.

Regardless, I hope to be included in future panels such as this one (at the MPLS conference and/or elsewhere) and I’m glad to have had the opportunity at MPLS 2009.  I would absolutely recommend that you attend future panels by Monique, at MPLS 2010 or otherwise, whether I’m taking part or not. Though, obviously, it would be better with my opinion included. ;)

CloudCamp St Louis – This Week!

Monday, December 7th, 2009

In the handful of weeks since we announced CloudCamp STL more than 100 people have registered to attend.  That is awesome!

Given the short notice and the holiday season, I expected a much smaller number.  But word of the event has spread, in no small part thanks to the marketing of Sam Charrington of Appistry.  He wrote a blog post about the upcoming event recently, and has sent messages to local user groups and industry contacts.  Others have spread the word, too, such as Alex Miller who also wrote a blog post.  Then, today I see that the Riverfront Times (RFT) also has a blog post about the upcoming CloudCamp STL meeting.  I couldn’t be happier with the publicity being generated by this CloudCamp.  It shows that St. Louis has what it takes to be a part of the global cloud community.  And when the meeting takes place, attendees / participants will see that St. Louis is, in fact, home to multiple industry thought leaders.

If you want to attend, please register at http://cloudcamp-stlouis-09.eventbrite.com/.  Registration is not required to attend but it helps us plan the space, food, and supplies.  And, of course, registration is free!

Finally, many thanks to our sponsors which currently include:

And also thanks to our Media Sponsors, which help promote the CloudCamp and spread the word to their communities:

St. Louis Java Users Group
StrangeLoopConference2009
Lambda Lounge