Job Description
Join us to shape the future of AI/ML data platforms, where your expertise will help create resilient and market-leading solutions. You will have the opportunity to collaborate with innovators across our global network, driving strategic change and mentoring others. We value your skills in solving complex challenges and fostering a culture of reliability and growth. At JPMorganChase, your impact will reach far beyond your team, opening doors to career advancement and meaningful relationships.
Responsibilities
- Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes
- Coordinate incident management coverage to ensure effective resolution of application issues
- Collaborate with cross-functional teams to perform root cause analysis and implement production changes
- Develop and support AI/ML solutions for troubleshooting and incident resolution
- Mentor and guide team members to foster growth and drive strategic change
- Build and maintain scalable, resilient, and market-leading data solutions
- Support budgetary and staffing considerations to optimize team performance
- Engage in operational stability and disaster recovery planning
- Implement automation tools to reduce toil and improve efficiency
- Ensure compliance with risk controls and company-wide standards
- Build meaningful relationships across teams to achieve common goals
Required Qualifications, Capabilities, And Skills
- Proficient in site reliability culture and principles, with experience implementing site reliability within applications or platforms
- Skilled in running production incident calls and managing incident resolution
- Experienced in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk
- Strong understanding of SLI/SLO/SLA and Error Budgets
- Proficient in Python or PySpark for AI/ML modeling
- Able to reduce toil by building automation tools for repeated tasks
- Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
- Awareness of risk controls and compliance with departmental and company-wide standards
- Collaborative team player with the ability to build meaningful relationships
Preferred Qualifications, Capabilities, And Skills
- Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies
- AWS and Databricks certifications
- Advanced knowledge of AI/ML troubleshooting and incident resolution
- Familiarity with budgetary and staffing optimization
- Experience mentoring and guiding team members
- Strong communication and interpersonal skills
- Demonstrated ability to drive strategic change across teams
About Us
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the worldâs most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicantsâ and employeesâ religious practices and beliefs, as well as mental health or physical disability needs.
#J-18808-Ljbffr