Resume

ZS Associates
Site Reliability Engineer | Jan 2023 — Present (Full Time)
Proactively supervise service performance for enterprise-level distributed systems serving 10,000+ users, improving response times by 45% and reducing resource consumption by 30%.

Lead incident response as primary on-call engineer with average MTTR of 15 minutes, reducing recurring issues by 70% through detailed root cause analysis.

Develop automation using Python, Bash, and PowerShell — reducing manual intervention by 65% and saving 25+ hours weekly.

Manage production Kubernetes clusters with Docker, implementing auto-healing, HPA, and achieving zero-downtime deployments through blue-green and canary strategies.

Deploy monitoring infrastructure using Prometheus, Grafana, Datadog, and Splunk, reducing alert noise by 50% through intelligent thresholding.

Implement IaC using Terraform, CloudFormation, and Ansible achieving 100% infrastructure reproducibility.

Define and supervise SLIs/SLOs for 50+ microservices, establishing error budgets and driving data-driven reliability decisions.
Samyak Softwares
Industrial Trainee — DevOps & Cloud Infrastructure | Jun 2022 — Jul 2022
Assisted with cloud infrastructure deployments, monitoring setup, automation scripting, and incident response procedures in AWS environments.
Participated in CI/CD pipeline development, configuration management initiatives, and operational documentation creation.

MIT Academy of Engineering, Pune
B.Tech in Information Technology — CGPA: 9.20/10 [2019 — 2023]
Graduated in Information Technology with distinction. Specializing in Cloud Computing and Distributed Systems.

Enterprise SRE Automation Platform
Python, Ansible, Terraform, Prometheus, Grafana | Ongoing
Architected comprehensive SRE automation platform reducing operational toil by 70%. Implemented self-healing mechanisms and automated incident response workflows achieving 80% reduction in manual intervention and 99.95% uptime. Developed automated run-books with rollback capabilities integrated with Slack and PagerDuty.
Distributed Microservices Platform
Kubernetes, Docker, Istio, AWS, Terraform | May 2023
Led reliability engineering for 50+ services across multi-region AWS infrastructure with Istio service mesh. Designed monitoring strategy using Prometheus, Grafana, and Datadog achieving 99.9% availability and sub-200ms P95 latency. Applied chaos engineering using AWS Fault Injection Simulator.
Cloud Migration & Optimization Initiative
AWS, Azure, Python, Terraform, CloudFormation | Aug 2022
Led migration of 100+ applications from on-premises to AWS with zero data loss and minimal downtime. Implemented cost optimization strategies reducing cloud spend by 35% while improving performance.