This is an archived post. You won't be able to vote or comment.

all 40 comments

[–]akallio9000 5 points6 points  (0 children)

And the biggest app in the cloud is... Wait for it... Conficker

http://farm1.static.flickr.com/87/240803829_9212773615_o.png

[–]Smallpaul 12 points13 points  (5 children)

Deploying 50 new instances in a cloud is easier than 50 new physical machines. But just because you can, doesn’t mean you should.

That much is common sense.

If it takes hours to install new machines, then you are doing your job wrong.

How can a small startup get to the point where it takes minutes to install 50(!) new machines in less than "hours"? I'd like to hear more about that.

If it takes weeks to get your machines, then you are using the wrong vendor.

Fine. But we were talking about minutes a second ago. What vendor can deliver 50 machines in an hour?

And most importantly if you suddenly realize that you need 50 new machines, then you simply didn’t do your job well. The cloud is not an excuse to avoid a business model. A business model includes a budget and a solid, implementable plan for growth based on thorough capacity planning. With that, you should see it coming.

On another, linked blog post, he says: "Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve. The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today’s already popular sites. Long story short, this makes planning a real bitch.... This, in many ways, is like a tornado. Our ability to predict them sucks. Our responses are crude and they are quite damaging. However, predicting these Internet traffic events isn’t even possible — there are no building weather patterns or early warning signs."

So traffic patterns are totally unpredictable, and can increase by up to 1000% and yet we are not doing our job right if our resource usage suddenly quadruples or quintuples? The only way to be ready for 1000% increase in demand without rapid provisioning is to be 1000% over provisioned before the wave hits, right? (CDNs, virtual machines etc. are a form of rapid provisioning...and part of any smart cloud strategy)

Furthermore, in that linked blog post, he says further things that are arguably contradictory. 1. the traffic arrives too quickly for you to be able to react, and 2. "be alert", "be ready to react".

  1. "These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support."

  2. "Be alert", "Perform triage", etc.

Let me rephrase what I think I'm reading: "I work at a company with a decade's worth of investment in software and hardware infrastructure. Everyone should run their brand-new startups as I run my mature and comparatively large company."

[–][deleted] 0 points1 point  (0 children)

Softlayer does pretty well for turnaround time, but they're still in the 2-4 hour range. I like them though because no matter the order quantity, they'll be able to provision it, most other vendors run out of equipment if you try ordering more than 10 boxes at a time.

[–]mpeters[S] -2 points-1 points  (3 children)

How can a small startup get to the point where it takes minutes to install 50(!) new machines in less than "hours"? I'd like to hear more about that.

A startup should probably look at automating their infrastructure from the very beginning. Tools like Chef and Puppet help.

[–]Smallpaul 8 points9 points  (2 children)

How would "automating infrastructure" allow me to get 50 machines from Dell into my colo center in "less than hours"? How would Chef and Puppet help with that problem?

Let's put aside the question of where I get the money....I'll just presume that I've got a huge credit line backed by my mortgage and am willing to lose my house if the traffic goes away before I've completely paid for my servers.

[–]neoice 1 point2 points  (1 child)

I wouldnt consider the purchasing of the machines as deployment. thats research/design. you figure out what kind of machines you want, buy the machines, wait a few days and THEN deploy them. the deployment process should be pretty easy. my shitty setup involves:

  • minimal OS (Debian Stable) from latest netinstall

  • add services to LDAP store (if they dont already exist)

  • install puppet on node and wait.

I've tested it and I can go from an empty drive to a functioning domain member (LDAP logins, SSH+keys, standardized security and logging) in 15 minutes. and this is just shit I built in my bedroom. if your job and business are based around deploying machines, I'd throw in PXE, custom install scripts, local package repositories and so much more. I bet if I started using Puppet's templating, I could automate a LOT more stuff.

[–]jawbroken 3 points4 points  (0 children)

that's completely irrelevant when talking about the traffic spikes mentioned in the parent posts

[–]etcshadow 5 points6 points  (2 children)

Overall, an excellent and timely discussion of this.

[–][deleted] -3 points-2 points  (0 children)

Not only that, but this blog post is so incredibly awesome that everyone on the planet will finally sit up and take notice, create a universally accepted definition of the word "hype", and stop committing hype.

Either that or its one more waste of bytes that will be praised by people who already agreed with it and ignored by everyone else. Hard to tell which way this one could go, really.

I wonder if we could get this guy to write an equally compelling definition of what does and does not count as "bloat" and then tellus all that "bloat" is bad.

[–]samlee 8 points9 points  (3 children)

tifa is pretty

[–]Deimorz 6 points7 points  (2 children)

Saw the article title, came for the samlee comment, was pleasantly surprised.

[–]clemesha 1 point2 points  (0 children)

Wow, that's crazy, I also saw the title and came for the samlee comment... pretty awesome to know I'm not the only one.

[–]ryeguy 0 points1 point  (0 children)

Needs more Jboss.

[–]13ren 1 point2 points  (0 children)

Hey! The cloud is the computer!

[–]G_Morgan 1 point2 points  (11 children)

What makes this different from a rack of servers? Very little, actually.

This is inaccurate. The major benefit of the cloud is it allows applications to share spare capacity. If I have a blog which at peak uses 20 times as much bandwidth as normal operation where peak time is unpredictable I need to purchase 20 times as much bandwidth all the time to be reasonably certain it will stay up. If I have 20 blogs and the chance of peak time coinciding is remote I can run all 20 in the space needed to guarantee uptime of 2 blogs. Effectively 18 blogs for free. Just share the costs on a per bandwidth basis and chances are you've saved a lot of money.

Extrapolate this to a million blogs and you have a ridiculous amount of bandwidth saving.

Yes the other stuff is nice. Somebody else manages my servers. I could get that anyway. What is really cool is I'm only paying for what I actually use and not for extremely speculative maximum capacity. That we share a peak time insurance pool that while extremely large is far smaller and thus cheaper than if we all bought our spare capacity independently.

[–]mpeters[S] 4 points5 points  (8 children)

No, this is a characteristic of virtualization, not the cloud. That's one of his main points.

[–]Smallpaul 2 points3 points  (7 children)

Think about it: what are you virtualizing? OmniTI is, among other things, a hardware/software/internet service provider. So they have a bunch of different customers that they can trade off traffic between. When they see a thousand percent spike (which he claims is common), they can steal resources from Peter to pay Paul.

Now imagine that you are responsible for a SINGLE such site. Where do you steal resources from?

[–]Mask_of_Destiny 0 points1 point  (6 children)

If you're a single such site you just buy a VPS instance if you're small enough. No need to go to a cloud provider just to be able to have access to peak bandwidth.

Above a certain size it doesn't make sense to get a VPS as colocation isn't all that expensive (though this depends on location). At this point, you have to have hardware to handle peak capacity, but you don't necessarily need to pay for peak bandwidth. Some places will give you a fixed width pipe that you can use 100% of, but many facilities will give you access to a larger pipe that you can use a certain portion of metered at the 95th percentile. So short spikes are "free" and for longer ones you pay the overage rate.

[–]Smallpaul 1 point2 points  (4 children)

Above a certain size it doesn't make sense to get a VPS as colocation isn't all that expensive (though this depends on location). At this point, you have to have hardware to handle peak capacity, but you don't necessarily need to pay for peak bandwidth.

And why would I want to buy hardware to handle peak capacity, when the author of TFA says that peak capacity can be 100-10000% of regular capacity, and can spike in 30 seconds?

[–]Mask_of_Destiny 1 point2 points  (3 children)

Because hardware is cheap (relatively speaking). How fast can you realistically notice that traffic is spiking, spin up more instances and get them integrated into your cluster? 30 seconds is doable but it takes a non-trivial amount of effort to get things setup so you can hit that target.

Now front-end servers are relatively straightforward to spin up, but what about cache servers or slave databases? How quickly can you replicate enough data to them for them to be useful in dealing with the load spike? These are solvable problems, but not without effort. Effort you could be spending on making your site better.

A single server can handle the peak load on a fairly large site. Stackoverflow only needed 2 servers for load reasons last year when they bought their hardware (they bought a second web server for redundancy for a total of 3).

Unless you plan on getting huge there's a lot to be said for saving yourself a lot of development headache and just getting a beefy enough server that you can handle peak load without spinning up new machines.

[–]Smallpaul 1 point2 points  (1 child)

Because hardware is cheap (relatively speaking). How fast can you realistically notice that traffic is spiking, spin up more instances and get them integrated into your cluster?

30-60 seconds.

http://aws.amazon.com/autoscaling/

30 seconds is doable but it takes a non-trivial amount of effort to get things setup so you can hit that target.

So? If I get a huge traffic rush and I'm down for 30 seconds, that's a lot better than being down for 24 hours. It's ridiculous to hold the cloud to a standard that is far beyond what you could possibly achieve with colocation.

Now front-end servers are relatively straightforward to spin up, but what about cache servers or slave databases? How quickly can you replicate enough data to them for them to be useful in dealing with the load spike?

Whether you need those or not will depend a lot on the details of your site. Once again, you're presenting a false dichotomy. Perhaps I choose to over-buy on database servers and not on app servers. In the cloud, I have that option. Or perhaps my memcached cluster keeps load off the database.

These are solvable problems, but not without effort. Effort you could be spending on making your site better.

That's not really for you to determine. It depends on the size of my spikes, and the amount of overbuying I need.

A single server can handle the peak load on a fairly large site. Stackoverflow only needed 2 servers for load reasons last year when they bought their hardware (they bought a second web server for redundancy for a total of 3).

Is StackOverflow's site-wide traffic bursty?

Unless you plan on getting huge there's a lot to be said for saving yourself a lot of development headache and just getting a beefy enough server that you can handle peak load without spinning up new machines.

Yeah, the "unless" is key. Also obvious: "unless you need to scale, you don't need the cloud."

Nevertheless, I take your point that there are anecdotes out there about extremely efficient sites that get away with huge load on 2 or 3 pre-bought servers, but I suspect that those sites depend heavily on caching in ways that are not appropriate to all sites.

[–]Mask_of_Destiny 0 points1 point  (0 children)

So? If I get a huge traffic rush and I'm down for 30 seconds, that's a lot better than being down for 24 hours. It's ridiculous to hold the cloud to a standard that is far beyond what you could possibly achieve with colocation.

If you have capacity to handle the spikes without spinning up extra machines then you have no downtime. My argument is simply that for the majority of sites, you're probably better off getting a server or VPS that can handle your peak load and not worry about making sure you can spin up and spin down instances to deal with variations in load.

Whether you need those or not will depend a lot on the details of your site. Once again, you're presenting a false dichotomy. Perhaps I choose to over-buy on database servers and not on app servers. In the cloud, I have that option. Or perhaps my memcached cluster keeps load off the database.

My point is simply that even if you can spin up new instances quickly it is not necessarily trivial to get them helping with your spike in traffic depending on your bottleneck is.

That's not really for you to determine. It depends on the size of my spikes, and the amount of overbuying I need.

Certainly. My point is that there's a tradeoff involved. Being able to scale the number of machines up or down at a moment's notice requires development time and adds complexity. For some sites, I'm sure it's worth it. But I'd argue that the vast majority of sites do not need this.

Is StackOverflow's site-wide traffic bursty?

The reason I brought up StackOverflow is that they run a large high-traffic site on a small amount of hardware with what appears to be a quite straightforward setup. Unless you're running a site that has substantially higher peak load than they do or have substantially higher overhead per user, you probably don't need to be able to scale out to a bunch of machines.

Also obvious: "unless you need to scale, you don't need the cloud."

If you don't need to scale up and down at a moment's notice you don't need the cloud. A regular VPS provider will probably meet your needs just fine until it starts getting more cost effective to either rent a dedicated server or colocate your own hardware. Certainly if you want to scale beyond a certain point, your site will need to be designed so that you can scale out beyond 2 or 3 machines at which point the additional effort required to allow you to scale up and down at a moments notice might be worthwhile, but I'd argue that most sites won't need to scale beyond what can be handled by 2 or 3 reasonably beefy machines running a straightforward stack.

That said, Amazon's Reserved Instances has made their pricing more competitive with regular VPS solutions even if you don't take advantage of being to scale down during low load periods. If you need a reasonable chunk of RAM (say at least 1GB) and your bandwidth and storage requirements aren't too high, it could work out to less than going to someplace like Slicehost or Linode as long as you don't mind paying part of your bill upfront.

[–]neoice -1 points0 points  (0 children)

I'm not sure why you're getting downvoted. bandwidth is the most expensive component and the limiting factor. almost any machine on the market will be able to serve up $x requests per second where $x < bandwidth available.

[–]joesb 1 point2 points  (0 children)

What if the peak hit require more CPU/memory than the single machine you bought?

[–]Rhoomba 0 points1 point  (1 child)

That is all well and good so long as you never have two blogs hit a peak at the same time. Cloud/virtualization savings are all based on oversubscribing hardware and hoping you don't get multiple peaks at the same time.

Unfortunately many people don't seem to realize that, and get all excited about cloud savings, even when they have SLAs that they must adhere to.

[–]G_Morgan 0 points1 point  (0 children)

The point is that not all blogs will hit peak at the same time and the more you have the more even the distribution becomes. Effectively in my scenario there is a point somewhere between 2 and 20 there is an ideal spot that gives both massive savings and a reasonably safe amount of leeway.

[–]asegura 0 points1 point  (4 children)

I was recently talking to someone running a Web application [kind of SaaS I think] and he told me his customers would not trust his service if it were in the cloud. He then uses a hosting service with a precise location, and a precise control on server accesses, content, software, transmissions, etc. He says in a cloud-like service you don't know where your data is or what the people running it do with it.

Would you run a site involving private or sensitive customer information in a cloud computing service like Amazon's or others? How reliable is that wrt to security and privacy?

[–]mpeters[S] 3 points4 points  (3 children)

This is something I've thought about too, but unless you have your own datacenter in a building you control there will always be people that could access your machines and your data.

[–]neoice 0 points1 point  (2 children)

while this is true, datacenters are designed around physical security. I work in one and yes, I could physical access your server. however, I'd be on camera and there's a large audit trail surrounding me. and in a building full of thousands of computers, how do I know (or care) whats on yours? you pay my company for the rackspace, my company pays me to make sure your rack is locked.

similar things can be said of cloud services, but I'm guessing their contracts explicitly state they're not responsible for ANYTHING.

[–]bluesnowmonkey 1 point2 points  (1 child)

Those cameras and audit trails are not exposed to the client, though. And if they were, am I going to sit around watching the security cameras over my servers? So it doesn't change the situation -- the client has to trust the hosting company, same as with a cloud provider.

[–]luke_ 0 points1 point  (0 children)

You have to trust them to some extent, but a lot of cabinets have the option of intrusion detection and audit logs concerning when they've been opened. So if you're extra paranoid, there you go!

Additionally a datacenter can provide a cage enclosing racks, and you can control the physical security of the entering the cage - not the datacenter staff.

[–]genpfault 0 points1 point  (1 child)

Is there something similar to encrypted remote storage, except for computation?

[–]hiddentalent 0 points1 point  (0 children)

It's possible, but quite difficult and mostly theoretical. Homomorphic encryption techniques can allow computation on encrypted data without the computing services being able to decrypt the data. That is, you define an encryption function F and decryption function ~F such that ~F(F(x) + F(y)) = x+y, replacing + with whatever computation you need done. There was a good introduction to homomorphic encryption on reddit not long ago; try searching for it.

For the forseable future, though, this isn't going to be possible. Instead, you should be pressuring your hosting provider -- whether "cloud" or traditional -- for high degrees of transparency and auditability of their actions so you can ensure they aren't looking at their data. Standards such as SAS-70 help certify that they're living up to their promises.

[–][deleted]  (1 child)

[deleted]

    [–]bitter_cynical_angry 2 points3 points  (0 children)

    Which, if anyone hasn't caught on yet, all you need to generate page views is to write a blog with a contrary title.

    Funny, this pretty much holds true with reddit also.

    [–]doubtingthomas -1 points0 points  (4 children)

    Honest question: Where is the hype? Maybe I read the wrong blogs, but I haven't seen much cloud hype.

    [–]ChiperSoft 1 point2 points  (2 children)

    I unsubscribed from a few web dev blogs because I got sick of all the "cloud computing" posts. In certain circles it's thick as smog.

    [–]neoice 0 points1 point  (1 child)

    [–]ChiperSoft 0 points1 point  (0 children)

    HAH! I didn't even mean to write that pun!

    [–]brownmatt 0 points1 point  (0 children)

    look at the marketing materials of most software or hardware companies these days, they treat "the cloud" as the newest product to sell to a (corporate) customer

    [–]recursive -1 points0 points  (0 children)

    Network management has to happen. Hardware management has to happen. You pay for it one way or you pay for it another. I’ve heard people say that it takes countless hours per month to run 40 systems including servers, switching equipment, routing, firewalls, etc. We manage around 1000 servers at OmniTI and from our immaculately maintained time tracking system I can tell you that less than 35 hours per month are spent on hardware provisioning, systems installation and concerns of space/power/cooling. That comes out to about 2 minutes per machine per month. Furthermore, I don’t have any reason to believe that a cloud provider can do a significantly better job.

    It may be true that this is the same overhead as a cloud provider would offer. But it's still at least an order of magnitude or two lower than what my company manages to achieve.