Infrastructure Engineer
About the Role
You will operate and improve production infrastructure focused on reliability, scalability and performance. You will manage incidents and participate in on-call rotations, design and implement multi-cloud Kubernetes and hybrid solutions using Infrastructure as Code, and automate repetitive operational tasks. You will build and refine monitoring and alerting, diagnose system failure modes and drive long-term fixes, write clear documentation explaining why decisions were made, and share domain knowledge with others. Expect to work with containers, orchestration platforms, public cloud providers, and to use Python or Go for tooling and automation.
Requirements
- Experience operating mission-critical services with responsibility for reliability and uptime
- Availability in Asia-Pacific Time Zone
- Production experience with containers and orchestration platforms (Kubernetes, Nomad, etc.)
- Experience with infrastructure automation tools (Helm, Terraform, Terragrunt, Ansible)
- Experience with monitoring solutions (Grafana, Prometheus, VictoriaMetrics, BetterUptime, ELK)
- Public cloud experience (AWS, GCP, Azure)
- Programming skills in Python and/or Go
- RDBMS and raw SQL query experience
- Proficiency in Linux and shell
- Strong systems thinking about edge cases, failure modes, and behaviors
- Enthusiastic proactive attitude and strong documentation habits
- Interest in and drive to learn blockchain technologies
Responsibilities
- Maintain and improve reliability and scalability of services
- Participate in on-call rotation and manage incidents
- Develop and manage hybrid multi-cloud infrastructure using Infrastructure as Code
- Improve monitoring and alerting of the platform
- Automate repetitive operational processes
- Plan, design, and execute agreed technical solutions
- Identify scalability bottlenecks and drive long-term resolution
- Improve and maintain documentation
- Share deep domain knowledge with team members
Benefits
- Stock options
- Flexible schedule
- Paid weekend on-call
- Remote work
- Multinational team
