Founding Cloud SRE — AI/ML Platform & GPU Compute

Company: Icehouseventures

Location: London

Posted: May 19th, 2026

Icehouseventures is seeking a Staff Cloud Site Reliability Engineer to shape the reliability of large-scale AI systems and GPU compute infrastructure. This founding role involves building and scaling reliability foundations for the AI cloud platform and ensuring cloud infrastructure resilience. Responsibilities include operationalizing SLOs, improving incident response, and creating automation for operations. The position offers a hybrid work model, encouraging collaboration in the London office while allowing remote work. #J-18808-Ljbffr
Apply Now