Company Description
Sitech disrupts the norm by creating digital products, services, and experiences that matter to people. As a global company, we hire talented specialists across various fields to join our dedicated and certified digital talent pool. Sitech has helped enterprises, both global and startups, create life-changing products through product discovery, innovative experience and product design, and custom software development.
Job Overview
We are looking for a
proactive and skilled Site Reliability Engineer (SRE) to join our growing team. In this role, you will focus on improving the reliability, scalability, and performance of our systems by applying automation, engineering best practices, and collaborative problem-solving. You will work closely with development, DevOps, and operations teams to ensure that our platforms are secure, resilient, and highly available.
Key Responsibilities
- Maintain and improve the availability, performance, and scalability of critical services.
- Implement and manage infrastructure automation using Infrastructure as Code (IaC) tools.
- Monitor systems proactively and create alerts to detect and resolve issues efficiently.
- Conduct root cause analysis of incidents and participate in post-incident reviews.
- Collaborate with development teams to embed reliability practices into the software development lifecycle.
- Enhance observability through effective monitoring, logging, and metrics collection.
- Manage and optimize CI/CD pipelines to ensure smooth, reliable deployments.
- Assist in disaster recovery planning, testing, and continuous improvements.
- Support compliance with security standards and best practices across systems.
- Participate in on-call rotations and contribute to incident response efforts.
Qualifications
- 3–5 years of experience as an SRE, DevOps Engineer, or Systems Engineer in production environments.
- Strong background in Linux systems administration, networking, and cloud platforms (AWS, GCP, or Azure).
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Proficiency in scripting or programming (Python, Bash, or Go preferred).
- Practical experience with Docker and Kubernetes for containerization and orchestration.
- Familiarity with CI/CD processes and Infrastructure as Code tools (Terraform, Ansible, etc.).
- Strong troubleshooting, problem-solving, and communication skills.
- Team player with the ability to collaborate across cross-functional teams.
Preferred Qualifications
- Experience working with distributed systems at scale.
- Knowledge of security best practices and compliance standards.
- Familiarity with SLOs, error budgets, and operational metrics concepts.