Archive for the ‘Network Architecture’ Category

New EMC Product Enables VM Mobility

Tuesday, May 11th, 2010

Yesterday I read an interesting blog post about a new product from EMC. The benefit of the product is that it enables very fast long-distance VM (virtual machine) mobility. But at the core, it’s a storage replication engine that focuses on “Distributed Cache Coherency”.

The VM mobility is impressive. To quote the blog post:

At EMC World today we vMotioned 100 VMs between clusters in 4 minutes and 500 in 20 minutes. That’s 2.4 seconds per VM. Across the equivalent of 100KM.

Clearly this wouldn’t be possible if the VM had to go through a typical “VMotion”, which involves (in simplified terms) pausing the VM, writing active memory to a file, moving the disk image and memory file to a remote server, and resuming the VM. The pausing and resuming aren’t time consuming but the movement of data across a network can be the source of significant delay. That’s where EMC does a little “cheating”. (or magic, depending on your point of view)

I don’t have first-hand knowledge of the platform. But I would guess that they’re keeping data mostly in-sync all the time, using background replication, so that the VM doesn’t need as much copied over the network at the point of migration. Even more importantly, they don’t even have to copy all of the data across the network before starting to use it. The platform copies a much smaller directory structure that points to the local drives for local data or the remote drives for data that hasn’t been copied yet. Upon accessing remote data it gets copied into local storage, which has a negative impact on performance. But the performance hit only affects that block at that moment; accessing any already-replicated blocks will run at native speeds.

Here is a diagram borrowed from the EMC posting to illustrate the idea:

vplex-DCC-GIF

As I said above, this is just my guess. The article goes into enough detail to get a rough picture of what happens. But I’m looking forward to hearing more details about this platform. I’m especially interested in hearing about conflict resolution. The article touches on this by saying:

The key is that VPLEX doesn’t solve the “what if two updates to the same block occur at the same moment?” problem (this is the core of the “did you solve the speed of light problem” question). The key is that MANY use cases ensure a single “writer” (single host writing to block at any given time), we don’t have to solve the “speed of light” problem (working on it :-) In the VMware use case, for example a single ESX host is the “writer” for a VM at any given moment, and a vMotion is an atomic operation – at one moment, one host is writing to a set of blocks, and then at next moment, another host is writing.

But the devil’s advocate in me still has to ask: but, what if?!? I’m sure I’ll have the chance to ask EMC someday soon. But regardless of their answer, I’m happy to see a platform like this one. EMC is uniquely positioned (given their relationship with / ownership of VMware etc) to make the assumptions and develop the tools that we’re going to need for any reasonable future in the cloud.

Level(3) Found God, Bought Metro Fiber

Wednesday, February 17th, 2010

I just read a great post on the Telecom Straight Shooter blog about Level(3)’s business.  Excerpt:

…all applications will originate and terminate in a metropolitan market with local access along with their associated revenues.  Long haul pipes are in vast quantity with plenty of inventory buried in the ground. In all fairness, however, if you are going to build a long haul network, you don’t undergo the expense to put only one pipe in the ground.

Metropolitan markets are 10x more expensive to build, operate and install than a long haul network. You actually require more fiber to be deployed in a metro setting in order to support stuffed, long haul dumb pipes from long haul networks dumping packets at a carrier hotel for metro distribution or third party interconnection facilities.

Reflections on MPLS 2009

Monday, December 14th, 2009

It has been well over a month since I attended the MPLS 2009 conference and participated in the panel on Emerging Technologies and Business Architectural Impact, and it is about time (over-due) that I posted my thoughts.

Foremost, I should say Thank You to Monique Morrow for organizing such a great panel and inviting me to participate.  I’ve known Monique as a colleague and friend for a while now, and whenever we have chance to meet she never fails to impress me.  In the context of this panel, I was dumbfounded at the quality and breadth of the other participants that she secured.  Monique moderated the discussion such that the panel’s large size was a benefit rather than liability.  As a panel we managed to cover multiple topics with decent depth, and were each allowed to illustrate our different perspectives.  I’ve been told by several audience members that it was an excellent panel, and from my (admittedly biased) perspective I must agree.

As for the discussion itself, I very much enjoyed participating.  Considering the quality of the other panelists, I am honored to have been included; each of the other panelists are recognizable for their contributions and role in the industry.  Given my respect for the other panelists, I tried to enter the conversation prepared and did not hold back any of my significant thoughts during the discussion… for better or worse.

Some of my comments may have included points that were controversial.  For instance, one theme that ran through the entire discussion was the complex balance of cost vs. capacity vs. features in network devices.  I challenged comments from Vijay Gill (of Google) and Donn Lee (of Facebook) which argued in favor of very-large dumb switches. ("dumb" is my word choice, but I suspect they would agree)  From their perspectives, as engineers for large web properties, they need to scale out single-tenant environments to support Internet-scale traffic loads and a simple L2 or L3 switch would enable their topology.  But, I argued, they were "weird" in their requirements, which are unique to large web properties.  Service providers and enterprise environments need more features in order to deal with the complexity and changing "customer" requirements they face daily.

After the panel I had the opportunity to chat with Vijay and Donn, and they had an interesting view of the cost / capacity / features debate.  Their comments deserve some focus, so look for a future post on this topic.

Another topic was the relevance of standards, which wasn’t particularly controversial but which caused some interesting comments.  My point was that standards are critical to the industry, but in the same way that fundamental research is critical to science and technology (broadly speaking).  We need to put effort into standards because it brings people together and promotes the state of the art.  But we also need to recognize that functioning interoperable implementations are what matter, regardless of the standards conformance, etc.  In other words, standards bodies should work diligently but not take themselves too seriously in the process.

Regardless, I hope to be included in future panels such as this one (at the MPLS conference and/or elsewhere) and I’m glad to have had the opportunity at MPLS 2009.  I would absolutely recommend that you attend future panels by Monique, at MPLS 2010 or otherwise, whether I’m taking part or not. Though, obviously, it would be better with my opinion included. ;)

Broadcom to Acquire Dune Networks

Monday, November 30th, 2009

Interesting… Broadcom announced that they’re acquiring Dune Networks, makers of high-speed high-density switch fabric chips.

IRVINE, Calif., Nov. 30 /PRNewswire-FirstCall/ — Broadcom Corporation (Nasdaq: BRCM), a global leader in semiconductors for wired and wireless communications, today announced that it has signed a definitive agreement to acquire Dune Networks, a privately-held company that develops switch fabric solutions for data center networking equipment. Data centers are scaling to provide significantly more bandwidth to meet the requirements of cloud computing, where computing resources, products and services, such as Software as a Service (SaaS), can be delivered real-time over the Internet. Dune Networks has developed a scalable chipset that supports bandwidth speeds of up to 100Gbps per port and can connect more than ten thousand servers (ports) in a single deployment.

Blade Networking Architecture: Cisco vs. Juniper

Monday, November 30th, 2009

My interest was piqued by news from Juniper Networks that they had licensed JunOS to BLADE Network Technologies.  Not that I’m surprised; I’ve suggested this approach (and some approaches more extreme…) to both Juniper and Cisco in the past.  But it did get me thinking about the differences in network architecture, between the approach of Cisco’s UCS versus the potential JunOS/BLADE approach.

For instance, a UCS system leverages a "fabric extension" module, transparently connecting multiple blades to a Top-of-Rack switch (TOR) such as the Nexus 6120.  When combined with the Nexus 1000V software (and VMware-integrated port profiles) an entire cluster of UCS chassis can be managed as a single resource pool.  Making a few design assumptions, the environment may look something like this:

This design provides a single logical interface to the external network (i.e. data center backbone, WAN, whatever).  However it is a flat network within the environment.  This is great for VM mobility throughout the resource pool, but not so great for scaling the network to many VLANs.

The alternative presented by Juniper provides an interesting comparison because it moves a switching function into the blade-server chassis.  Admittedly this doesn’t have to replace the TOR switch.  But just for argument’s sake, again making a few additional assumptions, the design could look something like this:

Note that I show a chassis with and without hypervisor-local switching.  This is just to illustrate the possibility, which admittedly could also exist in the Cisco UCS environment; this is not because I have some interesting point to make about it. ;)

Regardless, the Juniper/BLADE design may allow for better network scaling depending on the features that are bundled into the chassis switch.  However, in contrast to the UCS design there are more management touch-points.  And to really take advantage of the network scale possibilities, the network architecture itself has to be different — more oriented around layer 3, less of a wide-flat-layer 2.  This could be a problem for VM mobility depending on the overall application needs (mostly due to IP addressing and the need for shared broadcast domains).

The key next-step in either approach, in my opinion, is to deploy new features in the TOR and/or chassis switch.  Specifically, data center networking doesn’t have to be an all or nothing L2 vs L3 debate.  Please look for additional thoughts on this topic in a future post.