Core Concepts

Knowledge Graph

How SourcePress ingests documents, extracts entities, builds the graph, and exposes it for querying.

Overview

The knowledge graph is the foundation of the SourcePress pipeline. Documents, URLs, and transcripts are ingested via the API, classified, and passed through an entity extraction pipeline. The resulting entities and relations are stored in a graph that AI generation runs against. Content is never generated from thin air — it is generated against this graph.

The pipeline for a single document:

  1. POST /api/knowledge receives the document
  2. The engine classifies the content type and scores its quality
  3. Entities and relations are extracted and deduplicated
  4. GraphBuilder clusters entities using union-find and stores the result
  5. The graph is available immediately for querying

Ingesting Knowledge

Send a document to the ingestion endpoint. The path field is a unique identifier for the document within your knowledge base. The body field contains the raw text content. The source field records where the content originated.

curl -X POST http://localhost:3001/api/knowledge \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "path": "knowledge/products/platform-overview.md",
    "body": "SourcePress is a knowledge backend for the AI era...",
    "source": "internal"
  }'

The response confirms ingestion and returns the extracted entities:

{
  "ingested": true,
  "path": "knowledge/products/platform-overview.md",
  "type": "documentation",
  "quality": "high",
  "quality_score": 0.91,
  "entities": [
    { "name": "SourcePress", "type": "product" },
    { "name": "knowledge graph", "type": "concept" },
    { "name": "eval loop", "type": "concept" }
  ]
}
  • type — the classified content type (e.g., documentation, transcript, article)
  • quality — a human-readable quality band (high, medium, low)
  • quality_score — a numeric score between 0 and 1
  • entities — the entities extracted from this document

Rebuilding the Graph

After bulk ingestion or when the graph is out of sync, trigger a full rebuild. This re-runs entity deduplication and union-find clustering across all stored knowledge.

curl -X POST http://localhost:3001/api/graph/rebuild \
  -H "Authorization: Bearer YOUR_API_KEY"

The rebuild is synchronous. When the response returns, the graph reflects all currently ingested documents.

Querying Entities

Retrieve a specific entity and its relations by name.

curl http://localhost:3001/api/graph/entity/SourcePress \
  -H "Authorization: Bearer YOUR_API_KEY"

The entity name is matched against the graph index. Names are case-sensitive and correspond to the values returned in the entities array during ingestion.

Configuration

The knowledge graph backend is set in your SourcePress config file. Three backends are supported.

import { defineConfig } from "@sourcepress/core";

export default defineConfig({
  knowledge: {
    path: "./knowledge",
    graph: {
      backend: "local", // "local" | "vectorize" | "turso"
    },
  },
});
BackendDescription
localIn-memory graph. No persistence between restarts. Use for development.
vectorizeCloudflare Vectorize. Use for production deployments on Cloudflare.
tursoSQLite edge database via Turso. Use for persistent, low-latency edge storage.
  • knowledge.path — directory SourcePress watches and reads knowledge files from
  • knowledge.graph.backend — determines where entity and relation data is stored and queried

Examples

Ingest a transcript

curl -X POST http://localhost:3001/api/knowledge \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "path": "knowledge/interviews/cto-interview-2026-03.md",
    "body": "Interviewer: How does the eval loop decide when content is ready?...",
    "source": "interview"
  }'

Ingest a scraped URL

curl -X POST http://localhost:3001/api/knowledge \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "path": "knowledge/external/pricing-page.md",
    "body": "Starter plan: $0/month. Pro plan: $49/month...",
    "source": "https://example.com/pricing"
  }'

Query an entity after ingestion

curl http://localhost:3001/api/graph/entity/eval%20loop \
  -H "Authorization: Bearer YOUR_API_KEY"

URL-encode entity names that contain spaces or special characters.

Rebuild after bulk ingestion

# Ingest multiple documents, then rebuild once
curl -X POST http://localhost:3001/api/graph/rebuild \
  -H "Authorization: Bearer YOUR_API_KEY"

Rebuilding once after a batch of ingestion calls is more efficient than relying on incremental updates alone, particularly when entity deduplication across documents is important.