- Evaluate, monitor, and troubleshoot incidents in distributed systems running in cloud and hybrid environments.
- Implement monitoring and alerting strategies using tools like Grafana, Prometheus, and Loki.
- Conduct root cause analysis (RCA) and post-incident reviews to improve resilience and operational efficiency.
- Collaborate with developers, DevOps engineers, and global support teams to implement best practices for system reliability engineering.
- Develop CI/CD automation, deployment pipelines, and security/vulnerability remediation processes.
- Java
- Grafana
- Prometheus
- Loki
- Splunk
- Unix
- Linux
- Cloud infrastructure
- RDBMS
- CI/CD
- Ansible
- Jenkins
- GitHub Actions
- GCP
- Control-M
Production Systems Expert - Kraków - beBeeReliability
beBeeReliability Kraków
před 1 měsícem
Název práce: Site Reliability Engineer @
Popis
System Reliability Position
We are seeking a System Reliability Engineer to join our team. This individual will be responsible for ensuring the reliability and high availability of production systems used in global credit risk management.
Key Responsibilities:
Requirements:
About Us
We strive to create a work environment that is welcoming and inclusive to all employees. We offer competitive compensation and benefits packages to ensure the well-being of our staff.