Can we get on with building the software yet?
As you evaluate different cloud providers, it is important to understand the different concepts providers can use to deploy multi-tenancy. Different concepts facilitate—or limit—the way in which a provider can respond to changes in the service needs of clients.
General Purpose Clouds:
For example, some vendors design their clouds as commodities. They focus on providing low cost access to computing power in flat, homogenous environments. This type of general purpose cloud can scale quickly and easily to support large numbers of similar users. As they become saturated, however, you may begin to see variations in performance, as some users expand their usage and experience spikes that place constraints on all other uses.
Performance variations can affect computing power, storage and I/O or network traffic. Most providers already have solved performance problems associated with sharing VM RAM and CPU power, and most have deployed one or more of the many solutions for storage and I/O performance issues. Consequently, network performance is usually the first noticeable bottleneck. While it is important to know how your provider will handle performance variations wherever they appear, it is especially important to know how network issues will be handled.
The Concern: Network Latency:
Networks experience varying levels of latency based on where the users and their data reside and how much bandwidth has been allocated each user. The easiest solution to network issues within a cloud is to physically separate heavy users from lighter users. This means moving the heavy user to a private cloud where resources can be adjusted to meet the requirements of peak periods, more users and new applications.
The Answer: Scalability and SLA:
To reduce your risk of incurring more costs from your cloud provider, look for an enterprise provider that has scalability at every level of the cloud—SaaS, PaaS and IaaS. And look, too, for a provider offering a Service Level Agreement that addresses the performance requirements for the services most important to your business. These are the attributes of an enterprise level provider with the elasticity to meet your future needs.
One of the largest benefits that an application developer can get out of a cloud-based infrastructure is the opportunity to design for variable scale. Specifically, you can start off small (with a limited number of virtual machines, using limited host resources), and then expand your environment as usage grows. Conversely, you are able to shrink your infrastructure consumption during non-peak times. While some of this flexibility can be applied to existing legacy applications, the real win can be for newly developed systems.
To get this benefit, there are some fundamental architectural principals that need to be followed: loose coupling of system components, distributed system design and automated application installation / configuration. A solid architecture should reasonably scale from fitting the entire application onto a single VM to sharing it among hundreds (or even thousands) of VMs supporting the users.
To achieve the loose coupling and distributed design goals, you need to decompose the architecture into units of functionality and think through how they will distribute work within the system. Each component should be designed to support multiple instances of that application service within the environment. By doing this, you can load balance the application load as you need to scale.
This decomposition should happen at all layers. It does no good to scale out web servers if a singular application server will become the performance bottleneck. And definitely be sure to think through a scaling strategy for your databases. If you plan on using a traditional relational database platform (RDBMS), consider setting up your identity columns in a way that will support future distribution of load through sharding techniques. Another alternative is to use multiple read replicas, with a single write-enabled database instance. If you plan on going the route of NoSQL, be sure that you understand the scaling dynamics of the selected platform.
Achieving automated application installation and configuration builds on your distributed design. The key to ensuring that you can do achieve this architectural goal is to classify virtual machines into roles. Role definitions will let you relate one server to the other servers in the environment. Using a “web server” role as an example, perhaps any server in that role needs to know what database server to connect to. And just to relate this idea back to the point about determining a database scaling design, that database target might be different for different web servers.
Once you have a good understanding of how you plan to deploy a highly-distributed version of your system, it's time to automate your installation and configuration. These are critical tasks if you want to achieve value from a dynamic infrastructure environment, because you need to match the speed that you can install and configure an application with the speed that you can provision new infrastructure. Your software should be installable via command line, and you should look at different options to automate the configuration of the installed applications.
While you may want to take these concepts to the extreme, my best advice for a new application architecture is to start simple. Let these ideas guide your design, but remember, you’re main goal is to get the new application deployed for your users!
Here's a really simple application that can help explore the runtime environment of a Cloud Foundry provider (Sinatra ruby apps only, obviously):
I've been spending a good amount of my time in the evenings this week reading through the Cloud Foundry source code (available on github). I'll admit that I've caught the bug... the team at VMware has done a fantastic job of keeping the foundational system as simple as possible. I say that both from a user's perspective, and from looking through the internals of the code itself. To me, this is exactly the right way to solve a problem. Start with a simple solution to a generalized use case, and then make it work. From there, it's a matter of refinement and feature extension.
What I think is most powerful about the platform's approach, is that it is fundamentally based on the idea that the app runtime environment, supporting service and platform provider options can (and should) grow independantly. It's only been a few days, and the community as already provided the Cloud Foundry team with pull requests to add in JRuby and Erlang support. I have to imagine that new service support will quickly follow as well.
In terms of cloud providers, VMware made the right decision to host an instance of the platform in their own environment (which, combined with their new responsibility to host Mozy for parent EMC, is another topic altogether), but is fully expecting to see other providers offer differentiated versions of the platform. As Ezra Zygmuntowicz (@ezmobius) put it, "(VMware) want(s) this to be the kernel for the cloud, not only our cloud". Unlike vCloud, the openness of Cloud Foundry is what will make it more palatable to cloud providers, because it gives them numerous opportunities to establish differentiated solutions and offerings around the base platform.
Will VMware be abe to avoid some of the governance and political issues that Rackspace / OpenStack have run into? Rackspace and OpenStack appear to have gotten through that little rough spot, but I certainly hope that VMware learned from their experience.
For a quick overview of the internals, take a look at @igrigorik's post on the Cloud Foundry architecture. I also found this presentation (shared by Dave McCrory - @mccroy) targeting developers to be quite useful:
Disclosure: I work for a cloud platform provider, but the views in this post are mine alone. They do not reflect those of my employer.
I got access to cloudfoundry.com late last night, but just had a chance to start playing around this morning. First off, I love the simplicity of the developer experience... at least for Hello World style applications. I'll have to dig into it further, to start exploring how the services are implemented and how application instances scale.
Coming from an IaaS development background, one of the first things I was interested in digging into was the runtime details for the platform. I decided to extend the VMware Hello World ruby app, and have it return some details about the base OS supporting the app instance.
Here's the simple code:
require 'sinatra'
get '/' do
processor = `head /proc/cpuinfo`
memory = `head /proc/meminfo`
swap = `head /proc/swap`
linuxversion = `head /proc/version`
disks = `head /proc/partitions`
appuser = `whoami`
net = `cat /etc/network/interfaces`
output = 'local OS user: <br />' + appuser + '<br /><br />processor: <br /><pre>' + processor + '</pre><br /><br />memory: <br /><pre>' + memory + '</pre><br /><br />swap:<br />' + swap + '<br /><br />ver: <br />' + linuxversion + '<br /><br />disks:<br /><pre>' + disks + '</pre><br /><br />networking:<br /><pre>' + net + '</pre>'
output
endHere's the deployment process. The mem reservation property seems to be the only configuration that will affect the selection of an appropriate VM to host the app instance.
$ vmc push Would you like to deploy from the current directory? [Yn]: Y Application Name: test1 Application Deployed URL: 'test1.cloudfoundry.com'? Detected a Sinatra Application, is this correct? [Yn]: Memory Reservation [Default:128M] (64M, 128M, 256M, 512M, 1G or 2G) 64M Creating Application: OK Would you like to bind any services to 'test1'? [yN]: Uploading Application: Checking for available resources: OK Packing application: OK Uploading (0K): OK Push Status: OK Staging Application: OK Starting Application: OK
And here's what I get out of it, after deploying. Assuming that I don't turn it off or break this over time, you can hit the application live at http://test1.cloudfoundry.com/.
local OS user:
vcap-user-11
processor:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 37
model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
stepping : 1
cpu MHz : 2660.000
cache size : 12288 KB
fpu : yes
fpu_exception : yes
memory:
MemTotal: 16470448 kB
MemFree: 12623164 kB
Buffers: 217692 kB
Cached: 1959284 kB
SwapCached: 0 kB
Active: 2368488 kB
Inactive: 980064 kB
Active(anon): 1171648 kB
Inactive(anon): 164 kB
Active(file): 1196840 kB
swap:
ver:
Linux version 2.6.32-30-server (buildd@crested) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #59-Ubuntu SMP Tue Mar 1 22:46:09 UTC 2011
disks:
major minor #blocks name
7 0 131072 loop0
8 0 1049600 sda
8 1 999023 sda1
8 16 33554432 sdb
8 17 16474657 sdb1
8 18 17077095 sdb2
networking:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 172.30.49.74
network 172.30.48.0
netmask 255.255.248.0
broadcast 172.30.55.255
gateway 172.30.48.1Steve Jin (a great guy, who develops and runs the VI Java project, without which the vSphere API would have beaten me down more than once) has a great perspective on patentability of software architectures over on doublecloud. While I agree with his point about patent law, the main reason I point this post out is the value he places in "obvious" architecture being the right architecture.
This falls in line with my thoughts about building systems with an eye toward the future, but a focus on the present. We, as an industry, need to stop over-engineering things. Focus your time on achieving the system requirements, and just get it built already.
To me, if you want to patent something, make it a feature! Isn't that the real value we provide to customers? Sure, coming up with a scalable, robust, performant and extendable design is hard work. But the design constraints dictated by the features we want to build now (and in the future) should absolutely lead to obvious approaches to building the system.
Now this sentiment doesn't necessarily match every situation, but VERY few software projects warrant a non-obvious answer. In fact, if it's non-obvious, then that should really be due to the FEATURE being novel. Patent that.
If your high level architecture meets the patent law rule of being non-obvious, you're doin' it wrong.
I'm a big fan of designing systems to deal with component failures. But let's be honest, doing that perfectly is pretty darn hard.
In the research paper "Fundamental Concepts of Dependability", all possible sources of fault conditions have been classified into 16 different categories. In another paper, "Software Architecture Reliability Analysis using Failure Scenarios", an 8 step failure analysis process is proposed for how to understand a system's potential failure conditions. All this is about identifying and classifying fault conditions, not actually providing designs to resolve them.
I'm going to go out on a limb, and declare that nobody is doing that type of full and formal analysis for their cloud applications. (OK, perhaps somebody, but certainly not many.)
So that's the problem in a nutshell. How can you really say that you have fully designed for failure, given all of the possible failure conditions? And for the 90% of the cloud platform population that just want to get their apps built, how much time should they really be spending on solving this problem?
There's an awful lot of discussion going on in the Intertubes about the concept of Enterprise Clouds, but most of it is confusing private (single enterprise) clouds and the concept of a multi-tenant cloud platform that is built to help companies avoid the issues associated with “designing to fail”. One of the more useful descriptions of the difference between private and “enterprise” clouds (at least that I’ve found) is from Simon Wardley in his “Private vs. Enterprise Clouds” post. It’s worth taking the time to study his thinking, as I believe he has a firm handle on the situation.
As for the public vs. private debate, I just can’t get behind the private cloud solutions. The economics aren’t there, not to mention the very real challenge of finding enough multi-disciplinary talent to effectively build and maintain private cloud stacks. Fundamentally, I believe that the concept of a private cloud is based on the collision of “Not Invented Here” thinking within enterprise IT departments and the lack of adaptability of the companies providing them their IT infrastructure and service management products. Two wrongs don’t make a right.
To me, the main difference between the commodity cloud approach, and the enterprise cloud approach, is in the availability attributes of the utility service itself. It’s the economic conditions created by the commodity providers themselves that make it very difficult to offer anything other than an environment where tenants are strongly pushed to “design for failures”. This may change over time, either due to the commodity players deciding that they will take a significant margin hit to get a larger share of enterprise IT spend, or due to technical advancement in the platform fabrics themselves that improve reliability and availability of tenant VMs / data.
Unlike private clouds, I’m a firm believer that enterprise clouds have a role to play in today’s market. A window of opportunity for enterprise clouds exists right now, regardless of the commentary that points out it’s lack of adoption. I’d say that the lag in (not lack of) adoption is really caused by providers not having (until recently) the clarity of purpose to deliver viable and scalable enterprise cloud platforms. Now that adoption is starting to pick up, the providers are responding with appropriate investment in this area (with even AWS investing in features that better align it with the needs of the enterprise). This movement is what is going to give enterprise IT shops the ability to take advantage of many of the “cloud attributes”, while having a solid availability foundation on which to trust their critical systems.
While intended as a point for only believing in the commodity clouds, pointing to CAGR charts for the commodity clouds only serves to support the value of the overall ecosystem of cloud providers. It doesn’t really make a conclusive argument that commodity clouds are the only model that will win. Even with the great success of AWS in the market, assumptions that amazon will be in the range of getting annual revenues of $10B US by 2016 just show success in the market and the increasing rate of general growth in IT services. Gartner is estimating that the global quarterly IT spend for 2011 to be $360B, with an average of 4% CAGR. Given the solid growth of commodity cloud services, and understanding the needs of the enterprise IT buyers, I believe that enterprise cloud adoption will follow the earlier market trends of the commodity clouds for the next several years. There's going to be alot of room in the market for different types of cloud services...
All this is not to say that the market opportunity will exist forever, but the reality is that providing high availability for “legacy” applications in commodity clouds is a significant challenge with today’s features and technology. Just like the mainframes that just won’t go away, the relative usefulness of enterprise clouds will remain with us. Their important has faded, but mainframes are still critical to many institutions. Enterprise clouds will be very similar, existing as a bridge solution for the lifetime of enterprise “legacy” applications. We're just not anywhere near the top of the growth curve yet.
I have finally created the most flexible and powerful architecture ever. This is the be all and end all of software design. I almost feel silly for not having thought of this before. I am the Buddha of design now. This is #winning.
Here it is:
-chip