Amazon apologizes for Netflix outage
December 31, 2012 02:41 PM
With an apology for a disruption in its Internet-based services that left many Netflix Inc. customers without access to streaming videos for several hours on Christmas Eve, Amazon Web Services today said it was taking steps to prevent similar problems in the future.
Amazon Web Services, a unit of Amazon.com Inc. that provides “cloud” or Internet-based computing and data storage services, posted a lengthy and highly technical explanation on the AWS web site today detailing what went wrong on Dec. 24. The message ended with an apology and a promise to work hard to prevent such disruptions in the future. "We want to apologize," Amazon said. "We know how critical our services are to our customers' businesses, and we know this disruption came at an inopportune time for some of our customers."
The Dec. 24 service outage, which also affected the online operations of e-commerce technology and services provider Heroku and social networking technology company Scope, started shortly after noon Pacific time and interrupted video streaming service on many of the computing devices Netflix customers use to view streaming videos, a Netflix spokesman said last week. He added that Netflix worked with Amazon throughout the day on Dec. 24 to get service back up by that evening for most Netflix customers.
Amazon said last week that the problem occurred with the company’s U.S.-East region’s Elastic Load Balancing system, which is designed to regulate the volume of traffic in web applications, and that the system was back in full operation on Dec. 25.
A spokesman for Amazon Web Services says the outage did not “significantly” impact Amazon’s own Instant Video service, which didn’t require Elastic Load Balancing services during the outage period. Amazon Instant Video, which competes with Netflix, offers video streaming to customers of the Amazon Prime expedited delivery service. Netflix is No. 9 in the Internet Retailer Top 500; Amazon is No. 1.
In its nearly 1,200-word statement today, Amazon Web Services attributed the Dec. 24 outage to a step taken during a maintenance operation that inadvertently deleted data required to properly manage the Elastic Load Balancing system. To help prevent such inadvertent steps in the future, the company said it has implemented a procedure that requires software developers to get a Change Management approval from Amazon Web Services for each time they access the Elastic Load Balancing system.
In addition, the company said that, as a result of what it learned on Dec. 24, it has improved its Elastic Load Balancing data-recovery process and expects to be able to recover any lost data “significantly faster” if necessary in any future related problems.
“We will do everything we can to learn from this event and use it to drive further improvements in the ELB service,” the company said.
Netflix said on its Tech Blog on Dec. 24 that the outage primarily affected video streaming on TV-connected devices, such as game consoles, in the United States, Canada and Latin America, but that it did not affect Netflix service in the United Kingdom, Ireland or the Nordic countries. It noted that, of the hundreds of Elastic Load Balancers that Netflix uses to support video streaming, only “a handful failed.” It added that Netflix.com remained online during the streaming outage, continuing to support streaming to PCs and Macs.
Netflix also said it has plans in 2013 to build more resiliency into its cloud-based streaming services to avoid future problems.
Scope did not immediately return a request for comment; Heroku says its policy is not to comment on matters involving its technology vendors.