In conversations with both executives and IT staff alike, there’s often confusion around how “true” infrastructure as a service solutions such as Amazon Web Services or OpenStack differ from a traditional hosting arrangements. By “traditional,” I’m referring to colocation facilities hosting one’s own (likely virtualized) servers, network and gear, and storage equipment. As such I decided to publish a high-level article that quickly enumerates key differentiators between IaaS clouds (“cloud” being either public or private) and colo. For this article, I’ll use AWS to showcase cloud IaaS characteristics.
Before delving into the AWS nitty-gritty, let’s get the definition of cloud computing out of the way. The National Institute for Standards and Technology defines cloud as having five properties:
- On-Demand Self Service
- Broad Network Access
- Resource Pooling
- Rapid Scalability
- Metered service
(See the NIST Special Publication 800-145 for all the gory details.)
At the top of the list is efficiency. Technically this section is about scalability as well, but I’m going to approach the subject from the angle of cost efficiency first. Unlike traditional environments where you must purchase a minimum amount of infrastructure to become operational (a cabinet, a firewall, couple servers, etc), Amazon allows you start using a wide range of aggregate services through hourly billing. This goes beyond a simple CapEx vs OpEx conversation, as Amazon’s economies of scale lower the barrier of entry so that everyone can take advantage of enterprise-class equipment (which surely cost Amazon millions), all at a fraction of the price.
Now, back to scalability. When we think about scalability we may envision taking a basic application and “scaling it up” to handle increasing demand. An example would be scaling a web server cluster from two servers to four. But, let’s analyze what happens before that surge of demand hits your workload. At first, you have two servers sitting relatively idle. As demand increases, those two servers automatically grow to four, six, eight, or to whatever limit you set. When demand subsides, they again automatically shrink back down to an appropriate size.
Compare that to traditional hosting where you must build for the high watermark from the get-go. In other words, if you anticipate you may need eight web servers at some point, then you build all eight servers right now, and watch them sit idle until that day comes. Alternatively, you can build two servers now, and when demand increases, have your sysadmins scramble to add more nodes, update firewall rules, etc. Either way, not very efficient.
2) Time to Market
When building a traditional colo build, the longest poll in the project’s timeline is usually procurement. I’ve seen network gear purchases take upwards of twelve weeks. That’s twelve weeks before you can even think about powering on a single server.
Once you’re out of the woods with purchasing you must still go and physically rack the gear and configure it. Depending on org structure, that can equate to a frustrating volley between server, storage, and network engineering teams. Even in the best case scenario when all gear is fully configured with plenty of headroom for new infrastructure requests, the building of new virtual machines along with their associated storage, firewall rules, and load balancer VIPs often require numerous service request tickets which all take time.
With Amazon Web Services, you enter a credit card, push a few buttons or run a few scripts, and and application stack is online in minutes. Literally.
Automation is where Amazon Web Services truly shines. Virtually everything within AWS has an API and therefore can be controlled with code. This stands in stark contrast to traditional environments where there’s either partial or no automation whatsoever. The real beauty of Amazon APIs is that they’re they’re cross-functional; meaning they automate processes across compute, network, storage, load balancing, and security. For example, you can build a business intelligence application consisting of a Redshift database, a couple reporting servers sitting behind a load balancer, storage for saved reports, and firewall rules all from a single template. Imagine the complexity of trying to automate a similar software stack in a “best of breed” colo that has hardware for five or six different vendors. It hurts just thinking about it.
Here’s where things get a little more complex. In terms of skillsets, both colo and cloud (AWS) still demand some serious familiarity with infrastructure. This should come as no surprise, as one still needs to understand systems administration in various operating systems, load balancers, storage, and of course information security. I suppose you could say this is the common denominator between AWS and your traditional colo.
Yet Amazon Web Services are just that– web services. That means getting up to speed with things like RESTful and SOAP-based APIs, dealing with ticket granting services, various forms of authentication protocols like Oauth and SAML, and on and on. These are not your daddy’s data center protocols.
Of course, managing a traditional colo is no walk in the park, either. A typical data center has many different types of hardware and physical infrastructure technology. Trying to learn how to manage TripLite PDUs, Juniper switches, Checkpoint firewalls, NetApp filers, Dell chassis, and Vmware ESXi doesn’t happen overnight. I suppose you could say the learning curve between AWS and the traditional colo are about on-par with each other. But while both can be challenging to learn, the two skillsets are very different.
Allow me to explain non-persistence in a single phrase: “your servers can disappear.” Yes, your AWS servers, and perhaps the data stored on them locally, could simply vanish into thin air. Sounds scary, right? Believe it or not, it isn’t. In fact, this facet of AWS represents one of my biggest “a-ha!” moments with Amazon.
Back in the day (or in the present day if you’re still using a traditional colo), we would build servers one at a time. Great care would be taken in provisioning these servers, patching them, and providing them with general TLC over their multi-year lifespan. If a server “got sick” we’d go into panic mode, and try our best to triage the machine. In total failure scenarios, our recovery technique consisted of trying to recreate the server by building a new machine and restoring data from backups.
In the cloud, we don’t build systems this way, because servers can “go away.” So instead of treating a server like a long-lived resource, we instead treat it like raw power– electricity if you will. And just as the electricity in your home occasionally “goes away,” so too will your servers. Therefore we build servers to be as “dumb” as possible, and architecturally we decouple computing power from sensitive data that shouldn’t be lost. That way, if computing power is lost (or alternatively scaled up/down/sideways/etc) application data and logic are neatly decoupled in a safe manner. This is a very, very different design pattern we grew up with, but it’s incredibly powerful and one any critical workload should consider leveraging.
6) Astronomical Bills
Accounting is something everyone should become acutely aware of before commencing any major project within a public cloud. AWS has a great cost calculator which can be used to forecast your infrastructure costs, as well as billing alerts to warn when spending limits cross predefined thresholds. In fact, Amazon account representatives make it a point to ensure customers don’t go overboard with service consumption as that’s a surefire way to lose those very customers.
Yet if alerts are missed/ignored and if you fail to heed the advice of your cloud account rep, you could receive some pretty shocking bills. Unlike conventional hosting companies which have the ability to curtail service when you’ve surpassed certain limits, the sky’s the limit with public cloud. So be careful out there.
7) Commercial Software Licensing
This area is of relevance for the enterprise IT leaders out there. If you think you can simply “fork lift” your off-the-shelf enterprise application, middleware, or database into Amazon or Azure, you may want to read the software manufacturer’s fine print first. Many end-user license agreements (EULAs) license their product based on the number of CPU cores in your server. And when your “server” is Amazon, well, you’ve got a hell-of-a lot of cores to license.
More specifically, many vendors require you to license “all CPUs on the physical machine or grid of machines in a cluster.” Even though you’re obviously not using each-and-every CPU at the same time, it doesn’t matter. Oh, and virtual CPUs don’t count, either.
Fortunately, many of these same vendors have forged agreements with Amazon and thus offer special licensing arrangements to run their software specifically on AWS; either in a bring-your-own-license configuration or by buying the software and computing power from Amazon as a bundle. Either way, the moral of the story is: understand your vendor’s licensing terms or deal with the wrath of a very unfavorable software audit.