Core Concepts
Knowledge Graph
How SourcePress ingests documents, extracts entities, builds the graph, and exposes it for querying.
Overview
The knowledge graph is the foundation of the SourcePress pipeline. Documents, URLs, and transcripts are ingested via the API, classified, and passed through an entity extraction pipeline. The resulting entities and relations are stored in a graph that AI generation runs against. Content is never generated from thin air — it is generated against this graph.
The pipeline for a single document:
POST /api/knowledgereceives the document- The engine classifies the content type and scores its quality
- Entities and relations are extracted and deduplicated
GraphBuilderclusters entities using union-find and stores the result- The graph is available immediately for querying
Ingesting Knowledge
Send a document to the ingestion endpoint. The path field is a unique identifier for the document within your knowledge base. The body field contains the raw text content. The source field records where the content originated.
curl -X POST http://localhost:3001/api/knowledge \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"path": "knowledge/products/platform-overview.md",
"body": "SourcePress is a knowledge backend for the AI era...",
"source": "internal"
}'
The response confirms ingestion and returns the extracted entities:
{
"ingested": true,
"path": "knowledge/products/platform-overview.md",
"type": "documentation",
"quality": "high",
"quality_score": 0.91,
"entities": [
{ "name": "SourcePress", "type": "product" },
{ "name": "knowledge graph", "type": "concept" },
{ "name": "eval loop", "type": "concept" }
]
}
type— the classified content type (e.g.,documentation,transcript,article)quality— a human-readable quality band (high,medium,low)quality_score— a numeric score between 0 and 1entities— the entities extracted from this document
Rebuilding the Graph
After bulk ingestion or when the graph is out of sync, trigger a full rebuild. This re-runs entity deduplication and union-find clustering across all stored knowledge.
curl -X POST http://localhost:3001/api/graph/rebuild \
-H "Authorization: Bearer YOUR_API_KEY"
The rebuild is synchronous. When the response returns, the graph reflects all currently ingested documents.
Querying Entities
Retrieve a specific entity and its relations by name.
curl http://localhost:3001/api/graph/entity/SourcePress \
-H "Authorization: Bearer YOUR_API_KEY"
The entity name is matched against the graph index. Names are case-sensitive and correspond to the values returned in the entities array during ingestion.
Configuration
The knowledge graph backend is set in your SourcePress config file. Three backends are supported.
import { defineConfig } from "@sourcepress/core";
export default defineConfig({
knowledge: {
path: "./knowledge",
graph: {
backend: "local", // "local" | "vectorize" | "turso"
},
},
});
| Backend | Description |
|---|---|
local | In-memory graph. No persistence between restarts. Use for development. |
vectorize | Cloudflare Vectorize. Use for production deployments on Cloudflare. |
turso | SQLite edge database via Turso. Use for persistent, low-latency edge storage. |
knowledge.path— directory SourcePress watches and reads knowledge files fromknowledge.graph.backend— determines where entity and relation data is stored and queried
Examples
Ingest a transcript
curl -X POST http://localhost:3001/api/knowledge \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"path": "knowledge/interviews/cto-interview-2026-03.md",
"body": "Interviewer: How does the eval loop decide when content is ready?...",
"source": "interview"
}'
Ingest a scraped URL
curl -X POST http://localhost:3001/api/knowledge \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"path": "knowledge/external/pricing-page.md",
"body": "Starter plan: $0/month. Pro plan: $49/month...",
"source": "https://example.com/pricing"
}'
Query an entity after ingestion
curl http://localhost:3001/api/graph/entity/eval%20loop \
-H "Authorization: Bearer YOUR_API_KEY"
URL-encode entity names that contain spaces or special characters.
Rebuild after bulk ingestion
# Ingest multiple documents, then rebuild once
curl -X POST http://localhost:3001/api/graph/rebuild \
-H "Authorization: Bearer YOUR_API_KEY"
Rebuilding once after a batch of ingestion calls is more efficient than relying on incremental updates alone, particularly when entity deduplication across documents is important.