Staff Site Reliability Engineer, Fleets and Audio Spaces


March 10, 2021

Seattle, WA, US

Company Description
Twitter is what’s happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We feel real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.
Job Description

Who We Are

Twitter is seeking an experienced Site Reliability Engineer to work within the Fleets and Spaces products. We recently rolled out Fleets and Spaces globally and our journey is just beginning.
Twitter’s purpose is to serve the public conversation while amplifying sought-after voices within groups. Fleets and Spaces are some of our most ambitious bets in this space.
Fleets gives our customers an avenue for ephemeral sharing on Twitter. Spaces creates the opportunity for authentic verbal conversations where a host can start a conversation and others can join in real-time to discuss a variety of topics. Both products allow us to explore innovative creation and conversation experiences such as voice chatting, Live, Music, and more!
As SREs in this space, we work across both online and offline systems, and with different media serving technologies and content delivery systems. We work across on-prem, and cloud-based clusters.
How You’ll Work
  • You’ll embed deeply with your Software Engineering (SWE) counterparts and take an active role as a co-owner of production services to ensure services are built, maintained, and operated in a reliable and scalable way. You will be part of the successful delivery of new features and services, as well as the day-to-day successful operation of existing services.
  • You’ll collaborate with your SWE partners to drive operational health improvements, root cause analysis, postmortem discussions, and their associated remediations that serve to improve reliability and sub-linearly scale operations.
  • Partner with both SWE and SRE to use tools, processes, and techniques to reduce business risk. Perform infrastructure & configuration management, deploys, capacity modeling & planning, and incident mitigation.
  • Identify common patterns in challenges with operating services in production, partner with others to design and implement reusable solutions and/or other multi-functional work that drives down the complexity, difficulty, costs, and risks of operating the business.
  • You’ll be a member of a service on-call team, in the same on-call group as your SWE partners.

What You'll Do
We are looking for SREs who are passionate about highly performant user-facing services, have a desire to grow themselves and learn new technologies, love working in collaborative teams, and are committed to serving their customers!
Your responsibilities include:
  • Playing a leading role within engineering partner teams for AWS Infrastructure adoption, integration, and best practices.
  • Traditional SRE/Operational support scopes like tooling and automation, monitoring, workflow management, maintaining and improving data pipelines, CI/CD, monitoring, etc.
  • Partnering and supporting existing Content teams with operational guidance and expertise on various project initiatives.
  • Capacity Planning and scaling.
  • Ensuring media is served in a highly performant way that can also handle viral traffic surges.
  • Championing automation at every stage of the application lifecycle.


Who You Are

  • 8+ years of handling services in a large-scale distributed systems environment.
  • Expert knowledge of Linux operating system internals, filesystems, disk/storage technologies, and storage protocols, and networking stack.
  • Expert knowledge of systems programming (bash and shell tools) and practical, proven knowledge of at least one higher-level language (Python, Go, or Scala). This position will be primarily centered around Go.
  • Comfortable working with both on-prem and AWS in terms of deployment, support, monitoring, administration, and troubleshooting.
  • Track record of practical problem solving, excellent communication, and documentation skills.
  • Proven understanding of systems and application design, including the operational trade-offs of various designs.
  • Comfortable operating as a member of a team, and work well with a myriad of personalities at all levels.
  • Solid understanding of distributed systems design, scaling, durability, and security.

Additional Information

A few other things we value:

  • Challenge - We solve some of the industry’s hardest problems. Come to be challenged, learn, and thrive as an engineer.
  • Diversity - Diversity makes us a better organization and team. We value diverse backgrounds, ideas, and experiences.
  • Work, Life, Balance - We work hard, but we believe with hard work should come balance.

We will ensure that individuals with disabilities are provided a reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request an accommodation.