Blockchain Site Reliability Engineer
About the Role
You will ensure the reliability, availability, and performance of blockchain nodes and related infrastructure. You will monitor, troubleshoot, and resolve production incidents and participate in the on-call rotation to restore services quickly. You will develop automation and maintenance tools using languages like Golang, Shell, and Python, build and maintain monitoring, alerting, and logging systems, collaborate with protocol engineers and open-source communities on upgrades, and produce clear technical documentation.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent experience
- Strong Linux system administration skills including networking, performance tuning, debugging, and security
- Expertise with at least one mainstream programming language such as Golang, Python, JavaScript, or Rust
- Experience with monitoring and alerting tools such as Prometheus, Grafana, and ELK
- On-call and incident response experience with the ability to respond quickly under pressure
- Solid technical documentation skills
Responsibilities
- Deploy, monitor and maintain blockchain nodes across multiple networks
- Ensure system reliability and uptime by actively managing incidents and resolving node failures
- Develop automation and maintenance tools using Golang, Shell, Python, etc.
- Build and maintain monitoring, alerting, and logging systems
- Collaborate with engineering teams and solution architects on reliability improvements and incident prevention
- Participate in the on-call rotation to provide timely incident response and resolution
Benefits
- Medical insurance
- Vision insurance
- Dental insurance
- Short-term disability insurance
- Long-term disability insurance
- 401(k) plan with company matching
- Flexible spending account (FSA)
- Flexible paid time off
- Sick days
- Holidays
