SRE Engineer

About the Role

You will manage and maintain AWS infrastructure to ensure 24/7 availability and security. You will optimize system architecture and performance, deploy and maintain monitoring and logging solutions, automate operational tasks, and support high-availability application and database setups. You will troubleshoot incidents, perform backups, document operations and incidents, collaborate with engineering and product teams to design scalable systems, and participate in on-call rotation.

Requirements

  • 3+ years in Linux system administration and 24/7 operations
  • Proficient in MySQL and AWS RDS management and automation including hot and cold database architectures
  • Deep expertise in AWS services such as EC2 Lambda Aurora Beanstalk VPC IAM CloudWatch EKS and CloudFormation
  • Skilled in automation and configuration management tools such as Chef and Ansible
  • Strong understanding of system and network management monitoring and troubleshooting in Linux
  • Experience with high-availability systems and large-scale web architectures including EKS MongoDB and Kafka
  • Solid knowledge of cybersecurity best practices and incident handling
  • Excellent communication collaboration and problem-solving skills

Responsibilities

  • Manage and maintain AWS infrastructure to ensure 24/7 system availability and security
  • Optimize system architecture and performance by identifying and resolving bottlenecks
  • Deploy and maintain monitoring and logging tools such as Zabbix Nagios and ELK and create custom scripts as needed
  • Support high-availability setups for applications and databases and handle backups and troubleshooting
  • Collaborate with backend product and infrastructure teams to design scalable systems
  • Document operations and incidents and support internal IT needs
  • Participate in on-call rotation and incident response

Skills

Apply Now
SRE Engineer at Pontem Network | JobStash