Site Reliability Engineer
![]() | |
![]() United States, Michigan, Detroit | |
![]() | |
Site Reliability Engineer,IBM Corporation, Detroit, MI (Up to 40% telecommuting permitted): Manage cloud environments for stability, security, and satisfaction to ensure optimal customer experience. Set up and maintain multiple cloud staging/production environments in AWS and other major cloud providers. Collaborate with application and development teams and the Global Support Organization. Establish and enforce standards and procedures for the installation and maintenance of systems and data. Oversee and operate global customer environments to meet industry-leading targets for availability and quality. Coordinate with cross-functional teams (e.g., product management, engineering, solution architects) to deliver cloud-readiness capabilities and cross-product architectures. Observe and understand relevant cloud market trends and services to support the transformation and operation of full-stack enterprise applications in the cloud. Collect and review customer requirements to translate them into feature backlogs managed by product management. Partner with cloud hyperscalers (Azure, AWS, GCP) to jointly engineer and document cloud architectures, guiding customers on developing, deploying, and running application environments under mission-critical conditions, including security, high availability, recovery, sizing, scalability, and performance. Monitor cloud infrastructure, applications, and services to ensure high availability and performance. Manage backups, disaster recovery planning, and execution to protect data integrity. Develop and maintain scripts to automate cloud operations tasks such as provisioning, configuration, and scaling. Diagnose and troubleshoot cloud infrastructure and service issues to maintain reliability. Build and manage CI/CD pipelines to streamline software delivery. Generate reports on cloud cost trends and provide recommendations to assist stakeholders in decision-making. Utilize: Kubernetes, API Gateway, Developer Portal, Cloud Flare, Monitoring tools: Prometheus, Grafana and Elastic Search, Terraform, Python, Cloud Technology. Required: Masters degree or equivalent in Computer Science, Computer Engineering or related (employer will accept a Bachelor's degree plus five (5) years of progressive experience in lieu of a Masters degree) and one (1) year of experience as a ETL Developer or related. One (1) year of experience must include utilizing Kubernetes, API Gateway, Developer Portal, Cloud Flare, Monitoring tools: Prometheus, Grafana and Elastic Search, Terraform, Python, Cloud Technology. $167421 per year. Please send resumes to recruitad@us.ibm.com. Applicants must reference V214 in the subject line. |