How Facebook Moved twenty Billion Instagram Photos Without You knowing
Your Instagram photos aren’t where they used to be.
This spring, even as some 200 million people were using Instagram on their smartphones, a small team of engineers moved the photo sharing service from Amazon’s cloud computing service—where it was built in 2010—into a data center operated by Facebook, which bought Instagram in 2012. “The users are still in the same car they were in at the beginning of the journey,” says Instagram founder Mike Krieger, “but we’ve swapped out every single part without them noticing.”
Facebook calls it the “Instagration,” and it was an unprecedented undertaking for Mark Zuckerberg and company. Facebook has moved other acquired properties like FriendFeed into its data centers, but typically, they were small projects that involved shutting a service down before moving it into the Facebook universe. The Instagram switch was the live migration of an enormous—and enormously popular—operation. “The service couldn’t take any disruption,” says Facebook engineer George Cabrera. Facebook won’t say how many virtual machines were needed to run Instagram on Amazon, but it was in “the thousands.” And the service now stores over 20 billion digitals photos.
‘THE USERS ARE STILL IN THE SAME CAR THEY WERE IN AT THE BEGINNING OF THE JOURNEY, BUT WE’VE SWAPPED OUT EVERY SINGLE PART WITHOUT THEM NOTICING.’
For Instagram, the move was a way of more effectively plugging into a wide range of computing tools that have long helped drive Facebook’s vast online empire. And for the engineers overseeing Facebook’s worldwide network of data centers, it’s a template for merging their operation with applications the company may acquire in the years to come. “We were patient zero,” Krieger says. But the “Instagration” also provides a lesson or two for the broader tech community as it builds more and more apps atop cloud computing services like Amazon—apps they might one day migrate to private data centers. The key to the migration was a specialized Amazon service known as the Virtual Private Cloud.
In April 2013, about a year after acquiring Instagram for $1 billion, Facebook vice president of engineering Jay Parikh said the company planned to move the photo-sharing service to its own computing facilities, and the project started around the same time. The migration took about a year, and although it was a huge undertaking, it was handled by a small team. Eight engineers oversaw Instagram’s infrastructure in 2013, a number that has since expanded to 20. Cabrera says the team spent the better part of a year preparing for a month of data migration.
Since 2010, Instagram had run atop Amazon EC2, the seminal cloud computing service that lets anyone build and run software without setting up their own computer servers. To seamlessly move Instagram into an east coast Facebook data center–likely the one in Forest City, North Carolina–Cabrera’s team first created what essentially was a copy of the software underpinning the photo-sharing service. Once this was up and running in the Facebook facility, the team could transfer the data—including those 20 billion photos.
The process was trickier than you might expect. It involved building a single private computer network that spanned the Facebook data center and the Instagram operation on Amazon’s cloud–the best way of securely moving all of the data from one place to another–but the team couldn’t build such a network without moving Instagram to another part of the Amazon cloud. In other words, Krieger’s crew had to move Instagram once and then move it again. “We had to completely replace the car twice in the last year,” he says.
First, they moved it into Amazon’s Virtual Private Cloud, or VPC, a tool that let Krieger and his crew create a logical network that reached beyond Amazon into the Facebook data center. Creating this network was particularly important because it gave Facebook complete control over the internet addresses used by the machines running Instagram. If they hadn’t moved Instagram onto the VPC, they wouldn’t have been able to define their own addresses on Amazon, he says, which would mean dealing with myriad address conflicts as they moved software into the data center.
But things were even more complicated than that. The added wrinkle was that, in order to first move Instagram from EC2 to VPC, they also needed to build a common network across those two environments. Amazon doesn’t offer a way of doing that. So, as a temporary fix, Facebook built its own networking tool, something it calls Neti. The long and the short of Neti is that it was yet another extensive step in this year-long process–and therein lies the biggest lesson for those who might build atop Amazon and other cloud services.
VPC didn’t exist when Instagram was founded in 2010. Today, if other startups build on VPC from the beginning, they can avoid the extras steps that complicated Instagram’s migration. VPC also can help if you want to move just part of your infrastructure from the cloud into a private data center. “If I was starting a new startup or service from scratch today,” Krieger says, “I would totally just start on VPC.”
Once Krieger and his engineers were ready to actually move software and data from place to place, they turned to an increasingly popular tool called Chef. This is a way of writing automated “recipes” for loading and configuring digital stuff on a vast array of machines. They wrote recipes, for instance, that could automatically load the appropriate software onto machines running in the Amazon VPC. Then they used similar recipes to load much the same software on machines inside the Facebook data center. It built recipes for installing software on each flavor of Instagram database server, another for configuring what are called caching servers, which are used to more quickly serve up particularly popular photos, and so on.
The last of the software and data arrived in Facebook’s data center by the end of April. In the middle of the month, Instagram was plagued by an outage that effected users across the globe, but the company says this was unrelated to the migration. Though the move was lengthy and complicated, it all happened, according to Krieger and others, without the service’s 200 million users realizing what was going on.
Now, Instagram runs on its own dedicated machines inside the Facebook facility. According Facebook engineer Pedro Canahuati, this makes the service more efficient. It uses one server for every three it used on the Amazon cloud, he says, and because the Instagram and Facebook teams could share various techniques for moving data back and forth, Instagram’s “data fetching” times dropped 80 percent.