Yesterday over on the Netflix Tech Blog, Rajiv Aggarwal and David Temkin provided an excellent view into how they enabled IPv6 support for Netflix’s video streaming.
A couple of points in the article are worth calling out. First, they encountered an issue with the immaturity of some IPv6 code:
We found a reference leak in the IPv6 code that wasn’t apparent until you had processed 2^32 packets. Once this counter rolled over it would free active memory and cause the panic. Because we process large amounts of traffic we noticed this almost immediately.
These are the type of “growing pains” that we should expect to see as more and more production services move over into IPv6. Much of the IPv6-related code out there has been developed and extensively tested… but not necessarily in all the cases that actual production usage will expose.
Their rollout strategy was also interesting, in that it provided them a way to slowly ease into providing full deployment with an easy way to revert should a problem arise:
Our DNS provider enables us to resolve hostnames based on the geo-location of the caller. We used this during testing and rollout of IPv6 by starting with a specific geographic region and then expanding. We started with the state of California and monitored metrics for requests coming to us via IPv4 vs. IPv6. We specifically looked for any significant dips in IPv4 traffic that wasn’t accounted for in new IPv6 traffic. In addition, we watched to see if requests arriving via IPv6 were failing in similar or different ways than those via IPv4.
As they note in the post, this DNS-based solution didn’t work perfectly, but it worked well enough for them to be able to accomplish a successful rollout.
The end result is that per at least one study they now have the second largest domain taking IPv6 traffic!
Congratulations to the entire Netflix team – and thank you, too, for providing this technical report!