LEAP DAY 2: DevOps and SRE at VSTS Team

January 26, 2018 by rdagumampan

DevOps and SRE at VSTS Team
Sam Guckenheimer, PO for VSTS Team

Sam shares his team’s experiences in building and delivering Visual Studio Team Services and shows their internal dev’t process at VSTS DevLabs. It’s interesting to see their actual team’s backlog, issue tracking and reporting dashboards. And it’s not very different from ours at F5@Ørsted. Coolness! What’s most striking, VSTS dev team runs 69k unit tests in 29 minutes! Ok, that something to beat. We’re no way near this stats.

Sam also talks about metrics and what team don’t watch for, and I fully agree to most of them. The team pay little to no attention to:

  • original estimates
  • completed hours
  • lines of code
  • team capacity
  • team burn down

Site Reliability Engineering (SRE)

Another key take-away is the rise of Site Reliability Engineering (SRE).  In a typical enterprise, the dev teams works on new projects while having little reserved capacity for operations, incidents and continuous improvement. While this ensure that we continue to build new and exciting stuff, production systems’ load is increasing, performance is degrading, databases are getting fragmented, indexes are requiring rebuild and new vulnerabilities reported and needs to be fixed. These are often overlooked and only when customers reported issues that it gets the attention.

To continue the story… So the team delivered new service to production, the business starts using it, the PM celebrates wheeewww, and everyone is happy.  Then the team moves to another project. Sounds good so far? Well, the biggest casualty of this model is innovation. Software services needs to mature overtime. I always believe the best version will always be the odd numbers v3, v5, v7. If a service never reaches these version, it just matter of time before it blows up.

An SRE capability could be the answer to the long standing battle Dev vs Ops. An SRE engineer’s goal is to keep the services in tip-top condition and drive innovation to the service. An SRE must be able to write code in C# or Powershell, kick the CI/CD pipeline, optimize application servers, monitor and optimize databases, perform chaos engineering tasks. I know it may overlap the role of servers admins and DBAs but I believe it has been long overdue. We need to rethink the way we operate production systems.

At Google, SREs are made of 50% engineers and 50% administrators committed to improve features and operational environment. But we are not Google, nor we have Google scale. But I we should give this a try #JFTI.

Is SRE Devops? IMO, DevOps is a culture while SRE is a capability with very defined role. A team can embrace DevOps better by hiring SRE engineer.

Action Items

  • Try-out SRE role for new hire in the team (dev/ops) role

Credits

  • Thanks to fell LEAP attendee @henrihie for the picture

References

© 2017 | About | Contact | Follow me on Twitter | Powerered by Hucore & Hugo