Company Description
Cognitive Flow is an AI and software consulting firm dedicated to aiding enterprises and governments in accelerating digital transformation. Leveraging AI, cloud, automation, and agile delivery, we turn complex challenges into scalable solutions. Our goal is to enhance and streamline processes, driving innovation and growth for our clients.
Role Summary
The Senior DevOps Engineer will be responsible for designing, implementing, and maintaining scalable, secure, and highly available cloud infrastructure on Google Cloud Platform (GCP). This role involves automating deployments, managing CI/CD pipelines, optimizing cloud costs, and ensuring operational excellence for AI/ML and SaaS workloads across multiple environments (Dev, UAT, Prod, DR).
Key Responsibilities
Infrastructure & Cloud Management
Design, deploy, and maintain cloud infrastructure using Terraform / Deployment Manager / IaC best practices.
Build and manage CI/CD pipelines using Cloud Build, GitHub Actions, or Jenkins.
Manage Kubernetes (GKE) clusters, Cloud Run, and Compute Engine workloads.
Ensure security, scalability, and compliance across all GCP services (IAM, VPC, Firewall, Secrets, Cloud Armor).
Optimize costs through monitoring and right-sizing resources (BigQuery, Cloud Storage, Vertex AI, etc.).
Automation & Reliability
Implement infrastructure-as-code (IaC) and automate environment provisioning.
Manage logging, monitoring, and alerting through Cloud Logging, Cloud Monitoring, Prometheus, Grafana, or Splunk.
Establish SLOs, SLIs, and SLAs for services and ensure high reliability through proactive monitoring.
Lead incident management and root cause analysis (RCA) for production issues.
CI/CD & DevOps Practices
Standardize CI/CD workflows across projects and environments.
Implement blue-green / canary deployments and zero-downtime release strategies.
Maintain container build pipelines and Docker image registries (Artifact Registry, Container Registry).
Integrate automated testing, linting, and vulnerability scans into pipelines.
Collaboration & Governance
Work closely with software engineers, ML engineers, and data teams to streamline model deployment and infrastructure operations.
Define and enforce DevOps standards, governance policies, and environment naming conventions.
Mentor junior engineers and review IaC / pipeline code for best practices.
Participate in architecture reviews, DR drills, and cloud security audits.
Required Skills & Experience
- 5+ years of experience in DevOps, Cloud, or Infrastructure Engineering.
- 3+ years of hands-on experience with GCP (Compute, Networking, IAM, BigQuery, GKE, Cloud Build, Cloud Storage, Cloud Functions).
- Strong experience with Terraform, Docker, Kubernetes, and CI/CD pipelines.
- Solid understanding of networking concepts (VPC, peering, load balancers, DNS, Cloud NAT).
- Experience with monitoring and observability tools (Stackdriver, Prometheus, Grafana, or equivalent).
- Experience with Linux system administration and shell scripting.
- Familiarity with Python or Go for automation scripting.
- Proven experience managing multi-environment (Dev/UAT/Prod) infrastructure.
Preferred Qualifications
- GCP Professional Cloud DevOps Engineer or Cloud Architect certification.
- Experience supporting AI/ML workloads on Vertex AI or similar platforms.
- Background in GitOps (ArgoCD, Flux) or Service Mesh (Istio, Anthos).
- Exposure to security and compliance standards (ISO 27001, PDPL, GDPR).
- Prior experience in a SaaS or enterprise-scale environment.
Soft Skills
- Excellent problem-solving and analytical skills.
- Strong collaboration and communication across cross-functional teams.
- Passion for automation, optimization, and continuous improvement.
- Self-driven with the ability to take ownership and deliver in fast-paced environments.