The latest generations of server processors from Intel and Advanced Micro Devices don't deliver the promised gains in performance, according to the head of technical operations at Facebook, a massive consumer of servers.
The social networking company is constantly trying to upgrade its infrastructure to keep up with growth in users and data, while trying to minimize power consumption to save money, said Jonathan Heiliger, vice president of technical operations. He was interviewed on stage by GigaOm Network founder Om Malik at GigaOm's Structure conference in San Francisco on Thursday. Malik asked him about unexpected problems in keeping up.
“The biggest thing (that) surprised us is … less-than-anticipated performance gains from new microarchitectures — so, new CPUs from guys like Intel and AMD. The performance gains they're touting in the press, we're not seeing in our applications,” Heiliger said. “And we're, literally in real time right now, trying to figure out why that is.”
The hardware industry has also fallen short when it comes to delivering very power-efficient servers to carry out a limited set of functions for companies such as Facebook and Amazon, Heiliger said. He had some words for server OEMs (original equipment manufacturers).
“You guys don't get it,” Heiliger said. “To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient.” That means more than just an efficient power supply, but a whole system down to the processor, he said. Google has done a great job designing and building its own servers for this kind of use, Heiliger added.
Facebook is still working with server makers on the issue and doesn't know why they continue to fail, Heiliger said. He hopes to see cooperation among organizations deploying large computing clusters to develop a set of common standards that vendors can design for.
Heiliger had one piece of advice for anyone building an infrastructure to handle large-scale Internet-based services.
“There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'” Heiliger said. He added that Facebook does drive hard bargains with its hardware and software infrastructure suppliers, and is careful not to overbuy.
The best way to scale up a system is to look at application, software and hardware infrastructure, pick one to focus on, and add to that first. Facebook focuses on application infrastructure and upgrades the other two to keep up with that, he said.
Testing is another key to the operational success of Facebook, which has more than 200 million users and frequently introduces new features. Heiliger said the launch of Facebook's personalized usernames earlier this month went smoothly, despite an explosive response when it first went live, because of extensive testing of the new feature.
It took about two months to roll out the new feature, from concept to availability, he said. When the personalized usernames became available on a first-come, first-served basis, at 9 p.m. on a Friday night at Facebook's Silicon Valley headquarters, users claimed 1 million names in the first hour without slowing down the service as a whole, he said.