SRE Engineer

About the Role

You will manage and maintain AWS infrastructure to ensure 24/7 availability and security. You will identify and remove performance bottlenecks and optimize system architecture. You will deploy, configure and maintain monitoring stacks (Zabbix, Nagios, ELK) and write automation scripts as needed. You will design and support high-availability setups for applications and databases, handle backups, troubleshoot incidents, participate in on-call rotation, and document operations and incidents.

Requirements

  • 3+ years in Linux system administration and 24/7 ops
  • Proficient in MySQL and AWS RDS management and automation including hot and cold database architecture
  • Deep expertise in AWS including EC2 Lambda Aurora Beanstalk VPC IAM CloudWatch EKS and CloudFormation
  • Skilled in automation tools such as Chef and Ansible
  • Strong understanding of system and network management monitoring and troubleshooting in Linux
  • Experience with high-availability systems including EKS MongoDB and Kafka and large-scale web architecture
  • Solid knowledge of cybersecurity best practices and incident handling
  • Excellent communication collaboration and problem-solving skills

Responsibilities

  • Manage and maintain AWS infrastructure to ensure 24/7 system availability and security
  • Optimize system architecture and performance by identifying bottlenecks
  • Deploy and maintain monitoring tools such as Zabbix Nagios and ELK and create custom scripts as needed
  • Support high-availability setups for applications and databases and handle backups and troubleshooting
  • Collaborate with backend product and infrastructure teams to design scalable systems
  • Document operations incidents and support internal IT needs
  • Participate in on-call rotation

Skills

Apply Now
SRE Engineer at Pontem Network | JobStash