Resume
Experience
-
ZS Associates
Site Reliability Engineer | Jan 2023 — Present (Full Time)Proactively supervise service performance for enterprise-level distributed systems serving 10,000+ users, improving response times by 45% and reducing resource consumption by 30%.
Lead incident response as primary on-call engineer with average MTTR of 15 minutes, reducing recurring issues by 70% through detailed root cause analysis.
Develop automation using Python, Bash, and PowerShell — reducing manual intervention by 65% and saving 25+ hours weekly.
Manage production Kubernetes clusters with Docker, implementing auto-healing, HPA, and achieving zero-downtime deployments through blue-green and canary strategies.
Deploy monitoring infrastructure using Prometheus, Grafana, Datadog, and Splunk, reducing alert noise by 50% through intelligent thresholding.
Implement IaC using Terraform, CloudFormation, and Ansible achieving 100% infrastructure reproducibility.
Define and supervise SLIs/SLOs for 50+ microservices, establishing error budgets and driving data-driven reliability decisions. -
Samyak Softwares
Industrial Trainee — DevOps & Cloud Infrastructure | Jun 2022 — Jul 2022Assisted with cloud infrastructure deployments, monitoring setup, automation scripting, and incident response procedures in AWS environments.
Participated in CI/CD pipeline development, configuration management initiatives, and operational documentation creation.
Education
-
MIT Academy of Engineering, Pune
B.Tech in Information Technology — CGPA: 9.20/10 [2019 — 2023]Graduated in Information Technology with distinction. Specializing in Cloud Computing and Distributed Systems.
Certifications
-
AWS Certified Developer — Associate
Amazon Web Services -
HashiCorp Certified: Terraform Associate
HashiCorp -
Microsoft Azure Fundamentals (AZ-900)
Microsoft -
AWS Cloud Practitioner
Amazon Web Services
Projects
-
Enterprise SRE Automation Platform
Python, Ansible, Terraform, Prometheus, Grafana | OngoingArchitected comprehensive SRE automation platform reducing operational toil by 70%. Implemented self-healing mechanisms and automated incident response workflows achieving 80% reduction in manual intervention and 99.95% uptime. Developed automated run-books with rollback capabilities integrated with Slack and PagerDuty.
-
Distributed Microservices Platform
Kubernetes, Docker, Istio, AWS, Terraform | May 2023Led reliability engineering for 50+ services across multi-region AWS infrastructure with Istio service mesh. Designed monitoring strategy using Prometheus, Grafana, and Datadog achieving 99.9% availability and sub-200ms P95 latency. Applied chaos engineering using AWS Fault Injection Simulator.
-
Cloud Migration & Optimization Initiative
AWS, Azure, Python, Terraform, CloudFormation | Aug 2022Led migration of 100+ applications from on-premises to AWS with zero data loss and minimal downtime. Implemented cost optimization strategies reducing cloud spend by 35% while improving performance.
My Skills
-
SRE & Incident Response
90% -
AWS / Azure / GCP
85% -
Kubernetes & Docker
85% -
Terraform & IaC
85% -
Python & Automation
85% -
Prometheus, Grafana & Observability
80% -
Linux & System Administration
80% -
CI/CD & DevOps
80%