As a Senior DevOps Engineer, you will play a crucial role in designing, implementing, and maintaining our cloud infrastructure and services to ensure high availability, performance, and reliability. You will be responsible for implementing best practices, automation, and monitoring solutions to support our DevOps initiatives and drive continuous improvement across our technology stack.

Key Responsibilities:

  • Infrastructure Automation: Design and implement infrastructure as code (IaC) solutions using tools such as Terraform, or Pulumi to automate the provisioning and configuration of cloud resources and environments.
  • Continuous Integration/Continuous Deployment (CI/CD): Implement and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD, or CircleCI to automate the build, test, and deployment processes for our applications and services.
  • Monitoring and Alerting: Implement and configure monitoring and alerting solutions using tools such as Prometheus, Grafana, or Datadog to monitor the health, performance, and availability of our infrastructure and applications.
  • Incident Response and Post-Mortems: Participate in incident response activities and post-mortem reviews to identify root causes, implement corrective actions, and drive improvements to prevent recurrence of incidents and outages.
  • Capacity Planning and Performance Optimization: Perform capacity planning and performance tuning activities to ensure that our infrastructure and services can support current and future workload demands efficiently and cost-effectively.
  • Security and Compliance: Implement security best practices and compliance controls to protect our infrastructure and data assets, including identity and access management (IAM), network security, and encryption solutions.


Qualifications:

  • 5 years of experience in DevOps, with a focus on designing, implementing, and managing cloud infrastructure and services.
  • Strong proficiency in cloud platforms such as AliBaba, AWS, Azure, or Google Cloud Platform, with experience in infrastructure automation, networking, and security. Alibaba cloud is a plus.
  • Experience with container orchestration platforms such as Kubernetes, including deployment, scaling, and management of containerized applications.
  • Knowledge of scripting languages such as Python, Bash, or PowerShell, and experience with version control systems such as Git.
  • Familiarity with monitoring and observability tools such as Prometheus, Grafana, ELK Stack, or Splunk.
  • Excellent problem-solving and analytical skills, with the ability to troubleshoot complex issues and drive resolution in a timely manner.
  • Effective communication and collaboration skills, with the ability to work independently and as part of a cross-functional team.