Skip to content

Context Engine

The Context Engine is the persistent-memory layer of the platform: the hordago-knowledge-graph engine that stores biomedical entities, claims, and their provenance as a queryable graph and serves them back as graph-grounded evidence context. Hordago positions it as the Tier 2a knowledge_query engine and routes to it through the hordago-kg MCP category.

Three Distinct Context Surfaces

The platform runs three separate context/retrieval surfaces. They are complementary, not interchangeable — this page describes the Context Engine and draws the boundary against the other two so audits do not conflate them.

Surface Retrieval model Role
Context Engine (hordago-knowledge-graph) Persistent graph + GraphRAG + vector/hybrid Reasoning over a durable biomedical knowledge graph
context-discovery-mcp BM25 + Reciprocal Rank Fusion (RRF) Stateless skill/tool discovery index over repo surfaces
biocontext7 Skill catalog (bioinformatics tools) Tool-discovery MCP over 47K+ bioinformatics tools

The Context Engine is the only one of the three that persists a knowledge graph; context-discovery-mcp is a stateless lexical index and biocontext7 is a skill catalog.

Storage & Retrieval Stack

The engine layers a graph store, an analytical store, and a full-text/vector index so a single scientific question can be answered by lexical lookup, vector similarity, hybrid fusion, or multi-hop graph traversal.

Layer Technology Purpose
Graph store Neo4j Nodes/edges for genes, variants, pathways, drugs; Cypher query
Analytical store DuckDB Columnar analytics over graph-derived tables
Full-text index SQLite FTS5 Lexical search over node/claim text
Vector / hybrid search HNSW embeddings Semantic and hybrid (lexical + vector) retrieval
Community detection Louvain Community summaries for DRIFT-style exploration

Embeddings are produced with BGE and specter2 models (the embedding work consumed from hordago-knowledge-graph#162), giving the vector and hybrid search paths biomedical-tuned representations.

GraphRAG & DRIFT Reasoning

Retrieval is exposed as GraphRAG: neighborhood exploration, global summaries, and Microsoft GraphRAG-style DRIFT paths across the graph. The reasoning layer roadmap (entity linking, query decomposition, DRIFT path scoring, Bayesian evidence aggregation, answer synthesis) is tracked as epic E-02.

MCP Tool Surface

The engine ships as an MCP server exposing roughly 20 MCP tools grouped into five skills:

Skill Coverage
graph-query Query genes, variants, pathways, drugs, and ad-hoc Cypher
graph-ingest Ingest external nodes/edges and sources into the KG
graph-search Full-text and hybrid search
graph-explore Neighborhoods, global summaries, and DRIFT-style paths
graph-status Graph health, schema, cache, and namespace audit

Source Pointers

  • references/plugins/hordago-knowledge-graph.md
  • references/kg-reasoning-epic.md
  • docs/adr/003-kg-reasoning-layer.md
  • references/engine-catalog.md
  • src/hordago/kg_ingest.py
  • src/hordago/route_intent.py