Skip to content
@Rootly-AI-Labs

Rootly AI Labs

Pushing the boundaries of AI in incident management & system reliability

Building the future of reliability and operational excellence

The Rootly AI Labs is a fellow-led community designed to redefine reliability engineering. We develop innovative prototypes, create open-source tools, and produce research that’s shared to advance the standards of operational excellence.

Some of our projects:

  • Event Or Outage: Help SREs understand if a drop in traffic is due to an external event (holiday, election, sport event...) instead of an outage
  • Incident Diagram: generates a diagram highlighting what happened during an incident by ingesting the retrospective and associated codebase. LLM-powered.
  • DeepSeek log analysis benchmark: distilled DeepSeek R1 and benchmarked it against other models on system log analysis.
  • Reliability Engineer AI benchmark: developing an industry-standard benchmark to evaluate LLMs on system operations and incident management tasks. Rootly AI logo

Our fellows

  • Jeba Emmanuel – Ex-LinkedIn Staff Engineer
  • Casey Brown – Head of Platform Engineering at Venmo
  • Laurence Liang - McGill Engineering
  • Allan Parsons – Sr Staff Engineer at Venmo
  • Sylvain Kalache - Head of the Rootly AI Labs

About the AI Labs

The complexities of modern technology demand a paradigm shift in how we fundamentally approach reliability.

Rootly began in 2021 by building a category-defining on-call and incident response platform, trusted by thousands, including Replit, NVIDIA, LinkedIn, and Figma.

Now, GenAI is simultaneously introducing new complexities and unlocking opportunities to redefine reliability forever.

To stay at the forefront of this transformation, we’re launching Rootly AI Labs—a dedicated initiative exploring uncharted territories such as cognitive fault prediction, quantum-inspired optimization, self-evolving autonomic digital infrastructure, and advanced digital twin simulations.

Rootly AI Labs will operate as an open-source incubator, fostering collaboration, experimentation, and rapid prototyping. We're committed to ensuring our research benefits the entire community.

We're always eager to welcome new partners and fellows because we believe that together, we can go further, move faster, and push the boundaries of AI-driven reliability and operational excellence.

Popular repositories Loading

  1. Rootly-MCP-server Rootly-MCP-server Public

    Rootly MCP server

    Python 54 1

  2. logs-dataset logs-dataset Public

    A collection of logs used for training AI-powered Incident Management & SRE Automation

    33 1

  3. AIOpsLab AIOpsLab Public

    Forked from microsoft/AIOpsLab

    A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.

    Python 24

  4. IncidentDiagram IncidentDiagram Public

    A tool for creating diagrams from Incident Reviews/PostMortems using LLMs and AI

    Python 20

  5. EventOrOutage EventOrOutage Public

    EventOrOutage is leveraging LLMs to help SREs understand if a drop in traffic is due to an external event (holiday, election, sport event...) instead of an outage.

    Python 12 1

  6. .github .github Public

    1

Repositories

Showing 6 of 6 repositories
  • .github Public
    Rootly-AI-Labs/.github’s past year of commit activity
    1 0 0 0 Updated Mar 31, 2025
  • Rootly-MCP-server Public

    Rootly MCP server

    Rootly-AI-Labs/Rootly-MCP-server’s past year of commit activity
    Python 54 Apache-2.0 1 0 1 Updated Mar 25, 2025
  • AIOpsLab Public Forked from microsoft/AIOpsLab

    A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.

    Rootly-AI-Labs/AIOpsLab’s past year of commit activity
    Python 24 MIT 76 0 0 Updated Mar 20, 2025
  • logs-dataset Public

    A collection of logs used for training AI-powered Incident Management & SRE Automation

    Rootly-AI-Labs/logs-dataset’s past year of commit activity
    33 Apache-2.0 1 0 0 Updated Mar 19, 2025
  • IncidentDiagram Public

    A tool for creating diagrams from Incident Reviews/PostMortems using LLMs and AI

    Rootly-AI-Labs/IncidentDiagram’s past year of commit activity
    Python 20 0 0 0 Updated Mar 13, 2025
  • EventOrOutage Public

    EventOrOutage is leveraging LLMs to help SREs understand if a drop in traffic is due to an external event (holiday, election, sport event...) instead of an outage.

    Rootly-AI-Labs/EventOrOutage’s past year of commit activity
    Python 12 GPL-3.0 1 0 0 Updated Mar 13, 2025

Top languages

Loading…

Most used topics

Loading…