Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Project scanning

The std/project module now includes a deterministic L0/L1 project scanner for lightweight “what kind of project is this?” evidence without any LLM calls.

Import it with:

import "std/project"

What it returns

project_scan(path, options?) resolves path to a directory and returns a dictionary describing exactly that directory:

let ev = project_scan(".", {tiers: ["ambient", "config"]})

Typical fields:

  • path: absolute path to the scanned directory
  • languages: stable, confidence-filtered language IDs such as ["rust"]
  • frameworks: coarse framework IDs when an anchor is obvious
  • build_systems: coarse build systems such as ["cargo"] or ["npm"]
  • vcs: currently "git" when the directory is inside a Git checkout
  • anchors: anchor files or directories found at the project root
  • lockfiles: lockfiles found at the project root
  • confidence: coarse per-language/per-framework scores
  • package_name: root package/module name when it can be parsed deterministically

When tiers includes "config", the scan also fills in:

  • build_commands: default or discovered build/test commands
  • declared_scripts: parsed package.json scripts
  • makefile_targets: parsed Makefile targets
  • dockerfile_commands: parsed RUN, CMD, and ENTRYPOINT commands
  • readme_code_fences: fenced-language labels found in the README

Tiers

  • ambient: anchor files, lockfiles, coarse build system detection, VCS, and confidence scoring. No config parsing.
  • config: deterministic config reads for files already found by ambient.

If tiers is omitted, project_scan(...) defaults to ["ambient"].

Polyglot repos

Single-directory scans stay leaf-scoped on purpose. For polyglot repos and monorepos, use project_scan_tree(...) and let callers decide how to combine sub-project evidence:

let tree = project_scan_tree(".", {tiers: ["ambient"], depth: 3})
// {".": {...}, "frontend": {...}, "backend": {...}}

project_scan_tree(...):

  • always includes "." for the requested base directory
  • walks subdirectories deterministically
  • honors .gitignore by default
  • skips standard vendor/build directories such as node_modules/ and target/ by default

You can override those defaults with:

  • respect_gitignore: false
  • include_vendor: true
  • include_hidden: true

Enrichment

project_enrich(path, options) layers an L2, caller-owned enrichment pass on top of deterministic project_scan(...) evidence. The caller supplies the prompt template and the output schema; Harn owns prompt rendering, bounded file selection, schema-retry plumbing, and content-hash caching.

Typical use:

let base = project_scan(".", {tiers: ["ambient", "config"]})
let enriched = project_enrich(".", {
  base_evidence: base,
  prompt: "Project: {{package_name}}\n{{ for file in files }}FILE {{file.path}}\n{{file.content}}\n{{ end }}\nReturn JSON.",
  schema: {
    type: "object",
    required: ["framework", "indent_style"],
    properties: {
      framework: {type: "string"},
      indent_style: {type: "string"},
    },
  },
  budget_tokens: 4000,
  model: "auto",
  cache_key: "coding-enrichment-v1",
})

Bindings available to the template:

  • path: absolute project path
  • base_evidence / evidence: the supplied or auto-scanned L0/L1 evidence
  • every top-level key from base_evidence
  • files: deterministic bounded file context as {path, content, truncated}

Behavior:

  • cache key includes cache_key, path, schema, rendered prompt, and the content hash of the selected files
  • cached hits surface _provenance.cached == true
  • when the rendered prompt would exceed budget_tokens, the call returns the base evidence with budget_exceeded: true instead of failing
  • schema-retry exhaustion returns an envelope with validation_error and base_evidence instead of raising

By default, cache entries live under .harn/cache/enrichment/ inside the project root. Override that with cache_dir when a caller wants a different location.

Cached deep scans

project_deep_scan(path, options?) layers a cached per-directory tree on top of the metadata store. It is intended for repeated L2/L3 repo analysis where callers want stable hierarchical evidence instead of re-running enrichment on every turn.

Typical shape:

let tree = project_deep_scan(".", {
  namespace: "coding-enrichment-v1",
  tiers: ["ambient", "config", "enriched"],
  incremental: true,
  max_staleness_seconds: 86400,
  depth: nil,
  enrichment: {
    prompt: "Return valid JSON only.",
    schema: {purpose: "string", conventions: ["string"]},
    provider: "mock",
    budget_tokens_per_dir: 1024,
  },
})

Notes:

  • namespace is caller-owned, so multiple agents can keep separate trees for the same repo without collisions.
  • incremental: true reuses cached directories whose local directory structure_hash and content_hash still match.
  • depth: nil means unbounded traversal.
  • The filesystem backend persists namespace shards under .harn/metadata/<namespace>/entries.json.
  • project_deep_scan_status(namespace, path?) returns the last recorded scan summary for that scope: {total_dirs, enriched_dirs, stale_dirs, cache_hits, last_refresh, ...}.

project_enrich(path, options?) is the single-directory building block used by deep scan when the enriched tier is requested.

Catalog

project_catalog() returns the authoritative built-in catalog that drives ambient detection. Each entry includes:

  • id
  • languages
  • frameworks
  • build_systems
  • anchors
  • lockfiles
  • source_globs
  • default_build_cmd
  • default_test_cmd

The catalog lives in crates/harn-vm/src/stdlib/project_catalog.rs. Adding a new language should be a table entry plus a test, not a new custom code path.

Existing helper

project_root_package() now delegates to the scanner’s config tier after checking metadata enrichment, so existing callers keep the same package-name surface while the manifest parsing logic stays centralized.