StrataSearch

Documentation RAG search engine specialized in navigating legacy vs modern codebases

StrataSearch is a documentation search engine that uses an Advanced RAG (Retrieval-Augmented Generation) pipeline to understand code evolution. It automatically distinguishes between deprecated patterns and modern best practices in its responses, making it ideal for navigating legacy and modern codebases.

The system implements a multi-stage RAG pipeline designed for high-precision technical retrieval, runs entirely offline with local models, and provides a command-line-driven workflow for easy data management.

  • Legacy vs Modern Code Analysis: Automatically identifies deprecated code patterns and suggests modern alternatives.
  • Advanced RAG Pipeline: Implements a multi-stage retrieval process including multi-query generation and cross-encoder re-ranking for high-precision results.
  • Command-Line Driven Workflow: Includes Django management commands for data scraping, knowledge base ingestion, and RAG pipeline evaluation.
  • Private & Local-First: Runs entirely offline using local models with Ollama, ensuring data privacy and control.

Screenshots

1 2

Getting Started

Prerequisites

  • Python 3.10+
  • Ollama running locally

1. Setup & Installation

git clone https://github.com/joshuaglaz/stratasearch.git
cd stratasearch
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Build Knowledge Base

First, scrape the documentation you want to index.

python manage.py scrape https://docs.sqlalchemy.org/en/20/ --depth 2 --max 20

Then, ingest and index the scraped data.

python manage.py ingest

3. Run the Server

python manage.py runserver

Tech Stack

ComponentTechnologyDescription
BackendDjango 5.0Async Views & Management Commands
PipelineLangChainRAG Orchestration
InferenceOllamaLocal LLM (Qwen 2.5)
SearchFAISS + FlashRankVector Search + Cross-Encoder Re-ranking
FrontendHTMX + TailwindReactive UI without heavy frameworks