Archive for the ‘Network Architecture’ Category

Broadband vs. Internet Speed: Not So Fast

Wednesday, August 18th, 2010

An email I received this afternoon contained a forwarded link to an article entitled “Conflating broadband speed with Internet speed is misleading“. The article makes a valid point that access capacity (“Broadband speed”) isn’t the same thing as end-to-end throughput (“Internet speed”). Clearly this difference is valuable for consumers to understand, and is a critically important distinction in the Network Neutrality debate.

Which is why I’m disappointed in the article; sadly, it oversimplifies the issue to the point of covering up critical details.  The comparison to fax technology is imperfect, maybe even flawed. It conjures an incorrect conclusion in the mind of a reader. And the material result of this is to avoid a discussion of provider responsibility for effective bandwidth.

To be clear, end-to-end throughput across a network is affected by everything in between the two hosts (computers) that are communicating. It is affected by the equipment, configurations, and interconnects. It is also affected by the capability of the transport protocols, round-trip latency, packet overhead, and more. In this regard, the article is correct to say that effective bandwidth shouldn’t be compared directly to broadband access capacity. But likewise, to compare the effective bandwidth to the coding rates of fax machines is a vast oversimplification.

Looking at the factors that might affect end-to-end performance, a number of those are directly in the hands of the network provider.  The access link (i.e. broadband connection) to the customer is just the first component.  It terminates in an edge / aggregation network that is probably oversubscribed.  The edge networks may be interconnected across a backbone, with its own bandwidth constraints and physical distance inefficiency.  And Internet connectivity, to the backbone or to the edge network directly, is enabled by a number of peering and/or transit connections that are not necessarily equal.  This isn’t even considering the possibility of NAT, security, or bandwidth management devices that might constrain effective throughput.

When all is accounted for, there may be a considerable oversubscription rate.  Not that oversubscription is inherently bad; most users aren’t using 100% of their bandwidth at the same moment in time, allowing the provider to time-multiplex their users without causing negative performance.  And this oversubscription allows the provider to make money in an otherwise low-margin business.  But because it’s hard to determine how oversubscribed a provider is, they’re often tempted to push costs lower by oversubscribing more.  (Which is evident when providers get irritated by increasing usage, such as P2P traffic, by their customers.)  Further, the Internet transit connections might be acquired on-the-cheap, offering lower quality network paths (read: more oversubscription, more latency).  And the effect of these choices directly accrues against end-to-end performance.

Now, to be clear, I’m not advocating regulation of how service providers build their networks.  It should be up to each business to determine for themselves what is an effective network topology, interconnect strategy, oversubscription rates, etc.  But to focus the entire network debate on the access connections while ignoring the complex network that interconnects those to the Internet is misleading.

Private vs. Public Clouds

Wednesday, August 4th, 2010

I’m happy to see a post from @swardley about Private versus Public cloud terminology. I’ve never considered the issue settled clearly, so I’m glad to read in Private vs Public clouds:

I thought this argument has been settled a long time ago, seems not. So, once more dear friends I will put on my best impression of a stuck record.

Better to sound like a stuck record than delusional. ;) He goes on to note two dimensions of cloud definition: private vs. public, and internal vs. external

First what is the difference between a public and a private cloud?

  • A public cloud (the clue is in the name) is open to the public.
  • A private cloud (the clue is in the name) is private to some set of people.

There is another side to this which is your relationship to the provider. It is either :

  • external to you and therefore controlled, operated and run by another party.
  • internal to you which means it is controlled, operated and run by yourself.

Unfortunately, this definition seems to be the commonly accepted one. I say “unfortunately” because it fails to recognize an important dimension: connectivity. Let me just accept the terminology here, rather than fight it, and suggest a new term to describe the nature of network connectivity:

  • Global – cloud resources are connected (directly or via firewall and/or NAT) to the Internet
  • Local – cloud resources are connected to a private network, such as a VPN or other internal network environment

Now, I wouldn’t have picked these terms if I could have just used Public/Private or Internal/External instead. But since people have been stuck on yesterday’s way of thinking, that “Cloud” == “Internet Connected”, the good terms got used to describe control and ownership instead. But it is an important distinction, because a cloud isn’t any good without network connectivity and the nature of that network connectivity defines the baseline audience of users. Further it suggests a security paradigm, QoS domain, etc.

The choice of “global” and “local” mirrors terminology used to describe IP addresses, which might be “globally unique” or “locally unique”. (Of course, most people refer to the latter category as “private”, but I digress…)  If you can think of a better term for the dimension of cloud connectivity, please let me know. But whatever you do please don’t forget that the nature of network connectivity is an important distinction for cloud usability, especially in enterprise organizations.

Network Virtual Appliances Are Silly

Wednesday, June 9th, 2010

In recent years the possibility of supporting complex network functionality in the form of a “virtual appliance” has become reality.  The idea is to create software versions of traditionally “hardware-based” platforms, such as firewalls and load balancers (and their offspring Application Delivery Controllers), running as a VM atop whatever hypervisor.  I can see the appeal.  However, I’m convinced that the virtual appliance approach doesn’t work in the long-run.

For many of these appliances the conversion from hardware to virtual appliance isn’t hard; it’s not uncommon for network appliances to be based on off-the-shelf hardware with customized faceplates, so converting the software to a VM is straight-forward enough.  But therein is exposed a fundamental problem: in large part these off-the-shelf platforms don’t scale well.  Vendors have been able to take advantage of the latest off-the-shelf hardware for their appliances, which can perform at multi-gigabit speeds, because many enterprise applications don’t require anything near that capacity.  But any application that requires scale beyond the IO capacity a single server will be disappointed by a virtual appliance.

To illustrate the point, imagine a virtual appliance running as a VM capable of forwarding 1G of traffic but attempting to support a dozen VM servers each capable of generating the same 1G of traffic.  A dedicated, and possibly hardware-based, approach is obviously a better performing alternative.  Now imagine a Cloud service provider’s environment, where the datacenter traffic is the sum of all these applications densely packed into compute clusters.  Clearly a multi-tenant network appliance must be hardware-based to meet the aggregated demand.

Even if that demand is to be divided amongst many virtual appliances (i.e. a multi-instance approach rather than multi-tenant) we have a problem: topology.  Most often the network service needs to be in-line with traffic flows to work effectively.  For instance a firewall needs to see and control traffic flows, which is traditionally accomplished by being embedded within a network gateway.  For the most part, network topology is defined by the Cloud service provider and the customer has no control of it.  And even if the customer could control or manipulate topology, it is horribly inefficient to bounce traffic around inside the cloud on its way in or out, hurting performance.  This “amplification” of traffic is often inevitable inside a datacenter, but can be optimized by placing network services closer to the core switches and/or backbone routers.

In other words, network services are best provided by the provider of the underlying infrastructure.  Networks built to support these features will work better than trying to hack-together a service layer on top of an unintelligent Cloud network.

At least, this is true for now.  Look for my next post, which will discuss the future of network services around hypervisor- and network-embedded capabilities.

New EMC Product Enables VM Mobility

Tuesday, May 11th, 2010

Yesterday I read an interesting blog post about a new product from EMC. The benefit of the product is that it enables very fast long-distance VM (virtual machine) mobility. But at the core, it’s a storage replication engine that focuses on “Distributed Cache Coherency”.

The VM mobility is impressive. To quote the blog post:

At EMC World today we vMotioned 100 VMs between clusters in 4 minutes and 500 in 20 minutes. That’s 2.4 seconds per VM. Across the equivalent of 100KM.

Clearly this wouldn’t be possible if the VM had to go through a typical “VMotion”, which involves (in simplified terms) pausing the VM, writing active memory to a file, moving the disk image and memory file to a remote server, and resuming the VM. The pausing and resuming aren’t time consuming but the movement of data across a network can be the source of significant delay. That’s where EMC does a little “cheating”. (or magic, depending on your point of view)

I don’t have first-hand knowledge of the platform. But I would guess that they’re keeping data mostly in-sync all the time, using background replication, so that the VM doesn’t need as much copied over the network at the point of migration. Even more importantly, they don’t even have to copy all of the data across the network before starting to use it. The platform copies a much smaller directory structure that points to the local drives for local data or the remote drives for data that hasn’t been copied yet. Upon accessing remote data it gets copied into local storage, which has a negative impact on performance. But the performance hit only affects that block at that moment; accessing any already-replicated blocks will run at native speeds.

Here is a diagram borrowed from the EMC posting to illustrate the idea:

vplex-DCC-GIF

As I said above, this is just my guess. The article goes into enough detail to get a rough picture of what happens. But I’m looking forward to hearing more details about this platform. I’m especially interested in hearing about conflict resolution. The article touches on this by saying:

The key is that VPLEX doesn’t solve the “what if two updates to the same block occur at the same moment?” problem (this is the core of the “did you solve the speed of light problem” question). The key is that MANY use cases ensure a single “writer” (single host writing to block at any given time), we don’t have to solve the “speed of light” problem (working on it :-) In the VMware use case, for example a single ESX host is the “writer” for a VM at any given moment, and a vMotion is an atomic operation – at one moment, one host is writing to a set of blocks, and then at next moment, another host is writing.

But the devil’s advocate in me still has to ask: but, what if?!? I’m sure I’ll have the chance to ask EMC someday soon. But regardless of their answer, I’m happy to see a platform like this one. EMC is uniquely positioned (given their relationship with / ownership of VMware etc) to make the assumptions and develop the tools that we’re going to need for any reasonable future in the cloud.

Level(3) Found God, Bought Metro Fiber

Wednesday, February 17th, 2010

I just read a great post on the Telecom Straight Shooter blog about Level(3)’s business.  Excerpt:

…all applications will originate and terminate in a metropolitan market with local access along with their associated revenues.  Long haul pipes are in vast quantity with plenty of inventory buried in the ground. In all fairness, however, if you are going to build a long haul network, you don’t undergo the expense to put only one pipe in the ground.

Metropolitan markets are 10x more expensive to build, operate and install than a long haul network. You actually require more fiber to be deployed in a metro setting in order to support stuffed, long haul dumb pipes from long haul networks dumping packets at a carrier hotel for metro distribution or third party interconnection facilities.