Developer Productivity Engineering at Netflix

Advanced in Tech & Business

Developer Productivity Engineering at Netflix

This year the tech industry has had an obsession with developer productivity metrics. But measuring in isolation can cause more harm than good. Usually, developers distrust productivity tools, and, as layoffs linger, they may be compelled to game the system in response to job insecurity.

And since it’s proven that happy workers are more productive, anything that detracts from developer joy would put productivity at risk.

Better than aiming just to measure individual output, there’s a movement toward software delivery enablement, focusing on team and even department and division efficiency and velocity. This team-based lens contributes to organizational psychological safety and is much more scalable. With this welcome trend comes a rise of the developer productivity engineering team, which focuses on developer experience (DevEx) and enablement over measurement.

“Developer productivity is the universal definition of how do we enable our technical community to focus on their day jobs and not have to worry about all the different Netflix-isms to get up and running and all the way through the software development lifecycle,” Kathryn Koehler, director of productivity engineering at Netflix, told The New Stack.

Engineering is, of course, a science, which is why high-performing organizations like Netflix, Google, LinkedIn, Spotify and Atlassian are all measuring their platform teams’ impact on developer productivity. The difference is that they are focused on measurement not for change management’s sake, but rather for continuous improvement of their own craft.

Now, let’s learn from Koehler about how the engineering exemplar Netflix organizes and then measures its own flavor of platform engineering and developer productivity engineering, so you can hopefully learn ways to enable your software engineering teams too.

The Unique Platform and Productivity Organization at Netflix

Unsurprisingly, Netflix has a complex set of what Team Topologies would call enabling, complicated subsystem and platforms teams. (Yes, that’s plural.)

First, there’s the centralized platform team of 150 people, which “creates the tools, the platforms and the infrastructure to handle abstracting away all of the isms so that our developer community — who are our customers, our internal developer community — can do their best work and focus on their domain of excellence,” Koehler said. In a bit of a “hub and spoke model,” there are two teams within. First, Koehler’s 80-person developer productivity engineering team within the platform team owns the inner development loop — build, test, code, continuous integration, all the way up to but not including deploy, as well as the end-to-end developer experience, including source control and dependency management.

“We’re building an end-to-end developer front door, which [is] a command center or control center for people to own and operate their software,” she explained. Her counterpart owns delivery, observability and site reliability engineering.

In addition, the greater Netflix platform engineering team includes a cloud infrastructure team and a data platform team. Koehler said, “We abstract away productivity from our cloud infra which should be a separation of concerns where you shouldn’t have to get down and dirty with understanding compute and networking and storage details.”

Effectively, “that stuff should only be limited to the super user,” she continued. “But if you’re an application developer or if you’re generating a service, you should just be able to stand these things up with a couple of parameters along the way [such as] application configurations, and not really know the nasty business under the hood.”

Remarking that organizational design is built to only last about 18 months, Koehler observed that most of this current incarnation has been around for about three years now, with pieces of centralized infrastructure going back several years before that.

At its core, the greater Netflix productivity and platform division looks to abstract out anything that distracts developers from their flow state.

The Netflix platform team — including product management and internal customer support — is about 450 people supporting the 2,500 engineer org plus, another 500 on the data team. That’s 15% in enabling roles versus most orgs with established platform engineering programs that The New Stack has interviewed have less than 10%.

Also Read: How Google Unlocks and Measures Developer Productivity

What Internal Customer Support Looks Like

A centralized team handles Tier 1 and Tier 2 customer support for platform teams, which Koehler explained, “helps leave our customers, the engineering talent, to only dig in on the really hairy, complicated support issues. Otherwise, our ‘keep the lights’ on time, in terms of overall support would be too onerous. So it’s a way for us to scale better and keep our engineering footprint smaller.”

This was the first Platform as a Product org The New Stack has spoken to that provides such official internal platform customer support so we asked Koehler to dig in deeper. Netflix has several internal customer support engineers who have been onboarded by subject matter experts and respond to Tier 1 — i.e. helping devs find and use the best tool for the job — and Tier 2 requests — which gets deeper into debugging, helping people with more complicated issues. Tier 3 kicks in with the special circumstances that require a deeper knowledge of the underlying infrastructure.

Koehler said these tend to be the technical people who really like to help people. But “It’s really complicated to hit that balance of how big does this team need to be versus how big does the subject matter expert team need to be,” she reflected. “We’ve got a lot of downstream effects too, where they can spot trends better.”

Of course, not every organization will have the platform budget and scale, but this strategy is an important way for Netflix to enable its mostly 10x engineers.

Measuring Developer Productivity at Netflix

While Netflix tracks DORA and other quantitative metrics, Koehler said a lot of supplemental developer productivity metrics are qualitative. This includes a lot of user surveys “and figuring out what does toil look like for you,” she said, and “how hard is it for you to do your job.”

Netflix leverages the 2021 SPACE developer productivity framework, which suggests 25 sociotechnical factors that fall into the five buckets of the acronym:

  • Satisfaction and wellbeing
  • Performance
  • Activity
  • Communication and collaboration
  • Efficiency and flow

The Netflix developer productivity engineering team also has some really specific questions around sentiment around tooling:

  • Do you find the tooling delightful?
  • If you were to leave Netflix, would you be sad to leave the tooling behind?
  • Would you recommend your friends work at Netflix because the tooling is so good?
  • How effective do you feel in your job because of your tooling?
  • How frequently can you deploy and feel confident about what you’re deploying?
  • How buggy do you think the tooling is?

Netflix’s platform team clocks how many internal support calls they receive. They also have specific questions depending on the team and platforms they service. For instance, they ask Java developers:

  • What’s the startup time for application generation?
  • What’s the build time?
  • How long does it take to execute a test suite?
  • How confident do you feel about the observability tooling? At a glance, can you determine the health of your systems? Do you know that things are off before customers do?

They used to run these surveys twice a year, but Koehler said, it’s evolved to as and when the engineering leaders think it’s best.

“We kind of leave it up to the engineering leaders to have great relationships with our customers — which is great because they’re all internal,” she said. Roughly once a quarter, the platform team still pings the leads, asking for feedback on what they’ve just been building.

Netflix is planning to build more dashboards that will soon integrate qualitative feedback, as well as more quantitative reports and dashboards. The purpose of the latter, Koehler said, is to “teach a person to fish and they can see how their own systems are doing and how effective they’re being right now, like: How healthy is the fleet? Are they on the latest and greatest library versions?”

Her team is also trying to push developers to be more proactive and constant with communication, instead of a big bang survey four times a year, which, she said, can be “funky” and lends itself to interpretation. “It’s really hard to ask insightful survey questions. It also depends on who’s responding and did they have their coffee?”

Netflix’s company values include sharing information openly, broadly and deliberately, and communicating candidly and directly. It’s not surprising that none of the developer responses are anonymous, which guarantees a tighter DevEx feedback loop.

“It’s important that we know who is providing this feedback, and people don’t hold back,” she said. It also allows her team to close the loop with their internal developer customers: “Hey, you were really complaining about this before, and we’ve done this work here, so we wanted to check back with you and see if that’s improved.”

Also Read: Metrics-Driven Developer Productivity Engineering at Spotify

Staying on Netflix’s Golden Path

Netflix calls its platforms a “paved path.” This smoother road includes first-class supported infrastructure, language and tooling. There is also an arc of feature requests of parts not yet paved.

Over the last two years, the Netflix infrastructure teams have had to broadly extend their remit past studio and streaming. That meant paving the way for the entirely new business and technical requirements of gaming, advertisements and live. Gaming demands speed to onboard developers and to make it easy for them to be creative, while live-streaming requires high availability, low latency and resiliency.

“We have all these different concerns and we basically need to pave enough runway so that these planes can take off and that very much does hit the infrastructural side of the fence,” Koehler said.

Also within different customer organizations, Netflix has different platform teams in order to build out the platform offering or even build net new for specific user groups. “It’s not super scalable if we’re doing everything for everyone all at once,” which is why, she said they have other managed platform extensions for different groups like for consumer and content.

Currently, the productivity team is responsible for dozens of different tools and platforms, which is why they’re working to bring a lot of things together, to create a centralized front door or internal developer portal in order to foster a consistent user interface and developer experience. But it won’t be a catchall, Koehler said, because not everything will make sense to go in, but it’s a movement toward achieving “more of a center of mass.”

Always Room for Improvement

Documentation is the top thing that developers want, but they find it really hard to make a habit of contributing to it. Netflix is no different.

“We have a very strong freedom and responsibility culture and our engineers can at times lean into freedom over responsibility in some cases when it comes to writing comprehensive documentation,” Koehler said. Adding to that, again echoing the common refrain that platform engineers can have the habit of thinking they know best, she said, “Sometimes we have challenges, writing documentation from the platform provider perspective, rather than the customer’s perspective.”

The platform team is now partnering with the Netflix education team to help level up the docs, looking to answer:

  • Do we have the right information architecture?
  • Is it searchable?
  • Is it discoverable?
  • Is it consolidated into one tool?

This international documentation strategy is committed to bringing everything back into one tool that’s indexed, searchable, canonical and usable.

“I believe very strongly that discovery should really be a part of the tooling. So if we’re getting people into managed environments, or this portal, how do we surface contextual documentation and information at the moment in time when they need it?” Like many organizations, Koehler said they are looking at ways to leverage AI, like Clippy of the 90s but much more useful and with organizational context, because “discoverability is a big problem when you have so many different products under management.” They’re figuring out how to integrate documentation into the software developer lifecycle.

Her department is also working out how to make individual platform engineers feel accountable for documentation, by making docs and runbooks a part of the Definition of Done.

Also Read: Measure Developer Joy, Not Productivity, Says Atlassian Lead

Internal Platform Marketing

The other thing that Netflix’s platform team sometimes struggles with is very common among platform engineering strategies — communicating and marketing the benefits of staying on that paved road.

“Running these campaigns to get people on to the latest library, tool or framework is challenging. Because our customers have day jobs, and to find the budget in their schedule to do migrations — that could be complicated or painful — is hard,” Koehler said, “having us double down on making that easier for people, making those migrations as seamless as possible because we always be migrating. There’s always going to be the next of something that we ourselves are building or something that we leverage as third-party tooling and our customers should adopt it.”

It is crucial that app and data teams do not go past that long-term support window, or they can begin to introduce risk like security vulnerabilities. Healthy organizations, she said, are constantly rebuilding and deploying their entire fleet at a regular cadence, which relies on keeping the fleet up to date. It comes down to a battle over technical debt.

“People never allocate the appropriate amount of time to tech debt,” Koehler said, wishing that teams would work to keep their “keeping the lights on” — bug fixing, modernization, and such — below 30%, which means working to keep that technical debt down. “The only way to get out of a hole is to start filling in the hole.”

But there’s hope, not just for Netflix, but the whole industry as developer productivity engineering is on the rise. It’s even something Koehler is finally starting to find on resumes.

“People think about these tools from a high quality, delightful experience perspective that never really got that attention in the past,” she continued, “creating tools that delight, that are polished, that are solid, that have designers working on it — this is great. That first class or that first tier of importance. And it’s such a great space to be into because our customers are internal. How can we be any closer to the people that we’re trying to serve?”

Download your free copy of our ebook “Platform Engineering: What You Need to Know Now“

Group Created with Sketch.