AI Data Analyst Agent

View on GitHub

A multi agent AI system that automates ETL, cleaning, analysis with always human in the loop and self correcting capabilities, and allow users to query and clean the data using natural language.

The user ingests raw datasets, the system profiles schema structures, proposes deterministic cleaning strategies, executes transformations securely using sandbox on human approval, and provides conversational SQL and visualization interfaces allowing users to query and clean the data using natural language.

System Architecture

Architecture Diagram

The core workflow is orchestrated using LangGraph as a Super Graph containing independently compiled sub graphs.

Execution Safety

All LLM generated code is executed inside an isolated Docker sandbox. Strict timeout controls and file system isolation prevent system level risks.

LangGraph Graph

State Management

The system supports Human in the Loop workflows through PostgreSQL based checkpointing using AsyncPostgresSaver. Nested graph states are recursively flattened to allow seamless resumption across stateless HTTP calls.

Performance Metrics

Langfuse Monitoring

Frontend Interface

Frontend Preview

Tech Stack

FastAPI, LangChain, LangGraph, Groq Llama 3 models, Pandas, DuckDB, Docker, PostgreSQL, Redis, Langfuse