Staying Focused In The Busy Life Of A Site Reliability Engineer

June 28, 2022 - 5 minutes read - 1012 words

Life At Redpanda

Introduction

I was born in Poland and now live in the Czech Republic. To some, my background is interesting, because I didn’t start out in computer science or an engineering field. Instead, I got my Masters in Cognitive Science.

I was always interested in the brain and how our minds work. While that field tangentially touched on computing and programming — working with artificial intelligence, for example — I worked in several industries before deciding that technology was the path I wanted to pursue.

Today, I’m a Site Reliability Engineer (SRE) at Redpanda.

What does a site reliability engineer do?

I enjoy how widespread site reliability engineering is. You get to work on a variety of different projects at all times. This means that, on one hand, you need the technical skills and knowledge to recognize and troubleshoot potential problems, but you also need good communication and business skills to work with vendors and providers. It’s a unique mix of many different skill sets and knowledge pools. Because I have that mix of experience in my professional background and I’m curious about different industries and technologies, I enjoy the SRE life.

My work predominantly includes the juggling of two main objectives: ensuring the reliability of Redpanda’s products, and responding to any urgent engineering tasks. One day I might be drafting plans to guarantee that our users can rely on highly available products, and the next I could be tweaking our monitoring to further detect any surprising behavior in our systems. In typical engineering work, you have time to focus and get deep into the programming of specific features or platforms. As an SRE, you need to be able to do that, but also be able to respond to emergencies and random help requests. As such, it’s important to prioritize your focus during the work day.

Four ways to stay focused in the hectic world of site reliability engineering

Accept that you must task-switch quickly

It can be challenging to switch between development work and fire fighting. You need to be able to change tasks and understand context very quickly. I don’t have a magic piece of advice to offer for doing this better or more efficiently, but what works for me is a mix of coffee, pancakes, and constantly working to improve my productivity.

Work early (or whenever you’re able to get the most done)

My work hours help me stay focused and productive. Living in Czech Republic, I wake up and log on to start work for the day before many of my team members on the West Coast of the U.S. do. This gives me some quiet hours before the whole team is signed on to Slack to get some deep, focused work done.

Have a daily routine

I stick to a few daily routines that help me stay productive. I don’t work in my pajamas; I put on real clothes. I wake up at a similar time each day and have my coffee, then I go into my office and start working.

Have a space dedicated solely to work

I’m lucky in that I have a room in my home that is a dedicated office space. I invested in my office to make sure it’s a good working environment for me: I have big monitors, a nice keyboard, and a standing desk. I miss working with people sometimes, but regular Zoom calls with my colleagues partially make up for that.

My journey into site reliability engineering

After growing up in Poland, I eventually moved to Hungary, and I also lived in Indonesia for a short time. I’ve worked in communications, marketing, sales, and logistics roles. However, when I thought about the kinds of challenges I wanted to solve, I realized the tech industry was where I wanted to be.

My plan was to become a product or project manager — something on the business side of the tech world. I went through a developer bootcamp in 2017 just to catch up with modern development. Someone showed me how to use a command line interface and I got hooked. I was fascinated by the fact that using a CLI gives you the power of automation because I’d worked jobs that I burned out on due to a lack of it. I saw the power of CLIs and how, if you spend just a little bit of time thinking things through and writing scripts, you can automate hours of manual work and free up time to focus on more creative and challenging projects.

I put my goals of working on the business side of tech on hold and focused my attention on learning how to code and all the technologies that go along with that. I sought out the most difficult and challenging work that anyone would hire me for, and essentially went straight into site reliability engineering, which is possibly one of the hardest fields within software engineering.

I became a Junior SRE at Apiary.io, which was providing tooling for people who were designing and building APIs. I worked there for several years and really enjoyed it, but at some point I felt that it was time for new challenges. That’s when I joined Redpanda.

Conclusion

While there are multiple skills that we look for in new members to our team, I think some of the most important right now are deep expertise in Kubernetes, monitoring experience, or deep expertise in public cloud vendors. In general, though, we’re looking for kind, smart people who are open to learning new things and solving problems.

I’m particularly interested in monitoring & observability. While various tools dedicated to monitoring systems have been around for a long time, this is still a surprisingly niche field within engineering, and I’m excited to see how it, and SRE in general, evolves in the future.

Daria

P.S. Many thanks to my colleague Kayla Minguez for putting this post together and making sense of my stream of consciousness, and to Redpanda Data for allowing me to re-share it. See the original post on Redpanda’s website.

Banner