// Site Reliability | DevOps | Platform | Infrastructure
Currently working as a Staff Site Reliability Engineer. Driven by curiosity and a relentless pursuit of knowledge.
balbir@googlealumni.comJanuary 2023 - Present
> Program lead for Guidewire Production Readiness (GPR) to assess and provision new services to production
> Migrating legacy workloads to next‑gen Kubernetes platform (LHS to RHS)
> Built DR solution reducing RTO (Recovery Time Objective ) from 6h → 30min
> Ensured 99.99% uptime across 200+ microservices by optimizing Kubernetes clusters and implementing automated health checks
> Mentor and provide guidance to distributed team of senior SREs (US, EMEA & APAC)
August 2015 - January 2023
> Program lead for load balancer infrastructure which was processing 10K RPS
> Leading the effort to migrate on‑prem services our cloud partner AWS
> Building and maintaining a resilient, secure, and efficient SaaS application platform to meet established SLA
> Analyze system performance, identify problems, design, develop, and implement solutions
> Program lead for load balancer infrastructure which was processing 10K RPS
April 2014 - July 2015
> Working on Amazon Web Services as a Cloud Computing Infrastructure
> Mentor for the new joinee and the junior people in the team
> Ensure scalability and reliability of production servers
> Automated the infra using Chef and Cloud Formation
November 2011 - February 2014
> Assist in the roll‑out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
> Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large‑scale UNIX environment
> Served the role of Technical escalation Manager during the India TZ
> Build the team of 10 and started NOC Operation from Bangalore office
September 2010 - November 2011
> SME of messaging Infrastructure for Tibco implementation of JMS. I also supported the business environment including BPIM and WSG
> Migration of business critical application from SJ to RCDN data center
> Designed and automated DR, reducing failover time from hours to minutes in RTP data center
March 2008 - August 2010
> Installing, Configuring and deploying the RedHat and FreeBSD server
> Ensuring uninterrupted data flow across pipelines and delivering data to customers within SLA
> Providing end to end support – building system, application installation, testing, cluster management & validating Data delivered
July 2007 - January 2008
> LTSP – Deployed LTSP as the linux solution for the thin client in training department and removed citrix and windows terminal server for efficient
cost and reduced the complexity of license agreement
> Installation, Upgradation, Administration and Packaging of Apache Web Server 2.0.54 to 2.2.4 for the Corporate Intranet Web Server