Guides

Getting Started

Install SourcePress, create a project, define collections, start the engine, and ingest your first knowledge files.

Getting Started

SourcePress ingests documents, URLs, and transcripts, extracts a typed entity graph, and generates content against that graph. This guide covers installation through first ingest.


Requirements

  • Node.js >= 20
  • pnpm (workspace-aware install)

Install

Scaffold a new project with the create-sourcepress initializer:

npx create-sourcepress my-project
cd my-project
pnpm install

To add SourcePress to an existing project manually:

pnpm add @sourcepress/core @sourcepress/knowledge @sourcepress/ai @sourcepress/server

Create a Project Config

SourcePress reads sourcepress.config.ts at the project root. Use defineConfig, collection, field, and relation from @sourcepress/core to describe your content schema.

// sourcepress.config.ts
import { defineConfig, collection, field, relation } from "@sourcepress/core";

export default defineConfig({
  collections: [
    collection("articles", {
      fields: {
        title: field("string", { required: true }),
        slug: field("string", { required: true }),
        body: field("text", { required: true }),
        publishedAt: field("date"),
      },
      relations: {
        topics: relation("topics", { cardinality: "many" }),
      },
    }),
    collection("topics", {
      fields: {
        name: field("string", { required: true }),
        slug: field("string", { required: true }),
      },
    }),
  ],
  evals: {
    threshold: 0.8,
  },
  github: {
    repository: {
      owner: "your-org",
      repo: "your-repo",
    },
    branch: "main",
  },
});

defineConfig validates the schema with Zod at startup. Slug fields are enforced against /^[a-z0-9-]+$/ before use in file paths.


Define Collections

Collections map to content types in the output graph. Each collection declares:

KeyPurpose
fieldsTyped scalar properties (string, text, date, number, boolean)
relationsNamed edges to other collections with cardinality: "one" or "many"

Collections are the schema contract between the knowledge pipeline and generated content. The @sourcepress/core type system generates Zod validators from this schema automatically.


Start the Engine

The CLI ships as @sourcepress/cli with 10 commands. Start the local development server with:

pnpm sourcepress dev

This starts the Hono REST API server (from @sourcepress/server) with 19 endpoint groups including content, knowledge, graph, schema, eval, and approval routes. Auth middleware, rate limiting, CORS, and security headers are active by default.

To check engine status:

pnpm sourcepress status

Ingest Knowledge Files

Place source documents in the knowledge/ directory at the project root. Supported inputs: documents, URLs, and transcripts.

knowledge/
  product-overview.md
  interview-transcript.txt
  https-import.url

Run the ingest command:

pnpm sourcepress sync

The KnowledgeEngine runs a classify → extract pipeline. GraphBuilder deduplicates entities and applies union-find clustering to produce the knowledge graph. Output is reported as entity and relation counts:

Ingested: knowledge/product-overview.md
Entities: 7  Relations: 4  Clusters: 1

To inspect the resulting graph:

pnpm sourcepress graph

To detect gaps in knowledge coverage:

pnpm sourcepress gaps

Minimal Working Config

The smallest valid sourcepress.config.ts that will pass validation and start the engine:

import { defineConfig, collection, field } from "@sourcepress/core";

export default defineConfig({
  collections: [
    collection("posts", {
      fields: {
        title: field("string", { required: true }),
        slug: field("string", { required: true }),
        body: field("text", { required: true }),
      },
    }),
  ],
  evals: {
    threshold: 0.8,
  },
  github: {
    repository: {
      owner: "your-org",
      repo: "your-repo",
    },
    branch: "main",
  },
});

With this config and at least one file in knowledge/, pnpm sourcepress dev starts the server and pnpm sourcepress sync runs the first ingest.


Next Steps

  • Eval loop — configure evals.threshold and max_iterations to control generate-judge-improve cycles
  • GitHub approval — configure GitHubPRApprovalProvider to route approved content to PRs with full provenance
  • MCP server — connect AI agents via the 14-tool MCP interface in @sourcepress/mcp
  • Astro integration — use @sourcepress/astro for schema codegen and content syncing in Astro projects