Project

RAGBench

RAG evaluation platform — FastAPI, Qdrant, PostgreSQL, Next.js, Docker

Code ↗

The problem

Every RAG system has the same painful iteration loop: change the chunking strategy, re-embed everything, run eval queries, compare results manually. There's no fast way to know whether BM25 or dense retrieval is better for your corpus, or what chunk size maximizes recall without hurting precision.

RAGBench is a no-code platform for this experimentation loop — upload a corpus, configure a pipeline, run it, see the numbers, compare against a previous run.

Hybrid retrieval

After benchmarking keyword vs. semantic search on several document types, I found neither dominates: keyword search (BM25) is better for precise term lookups, dense search is better for semantic paraphrases. The platform implements Reciprocal Rank Fusion (RRF) to merge the two result lists, consistently outperforming either alone on mixed-query test sets.

Evaluation

Integrated DeepEval's RAG Triad — faithfulness, answer relevancy, context recall — run automatically after each pipeline execution. Results are stored in PostgreSQL and surfaced in a run-comparison dashboard so you can diff two pipeline configs side by side.

Dev experience

The entire stack is Dockerized. make up starts Qdrant, PostgreSQL, the FastAPI backend, and the Next.js frontend. No manual setup, no environment drift between machines.