MatrixArk Product Manual | Context APIs, Timeouts, Metrics

Public API surface

Customers should not model TemporalStore schemas directly in v1. MatrixArk owns extraction, canonicalization, indexes, summaries, embeddings, timeout handling, and context-pack construction.

API	Required input	What MatrixArk does
`ingest`	`tenant_id`, text, optional hints	Extracts event fields, creates or reuses context nodes, writes events, indexes, summaries, and embeddings.
`batch_ingest`	List of ingest items	Runs the same idempotent write path for bulk signals, resources, tool outputs, or migrated history.
`stream_ingest`	Stream id, offset, payload	Accepts ordered agent or workflow streams while avoiding replayed offsets.
`ingest_resource`	`tenant_id`, raw URI, optional hints	Parses Markdown, TXT, or PDF, stores chunks and source refs, and writes L0/L1 summaries plus embeddings.
`retrieve`	`tenant_id`, raw query, max context tokens	Plans the query, embeds it, traverses TemporalStore, filters by time and metadata, and returns a token-budgeted ContextPack.
`feedback`	Context pack id or query id, final answer, accepted/rejected refs	Stores accepted memory, corrections, rejected refs, and confirmation signals for future retrieval.
`audit` / `replay`	Context pack id or query id	Returns the plan, selected refs, dropped refs, timeout notes, token counts, and decision trace.

Deployment modes: hook or standalone, cloud or on-premise

MatrixArk can run as an invisible hook inside an AI agent loop or as a standalone context service that any application calls directly. Both modes use the same APIs, ContextPack audit format, TemporalStore-backed memory, summaries, embeddings, and retrieval pipeline.

Mode	What the customer integrates	Best fit
Hook mode	Before-LLM query hook, optional tool/resource hooks, after-LLM feedback hook, and confirmation hook.	Cursor-like products, enterprise assistants, IDE agents, workflow copilots, and vertical AI harnesses.
Standalone mode	Direct calls to ingest, batch ingest, stream ingest, resource ingest, retrieve, feedback, audit, and replay.	Enterprises that want one central context infrastructure layer across many agents and apps.

Hook + cloud

Fastest integration for AI harness vendors. Hooks call a managed MatrixArk endpoint while the agent keeps its own local context.

Hook + on-premise

Best for sensitive enterprise agents. Hooks call MatrixArk inside the customer VPC/VNET or data center.

Standalone + cloud

One managed context API for many apps when cloud data residency and governance are acceptable.

Standalone + on-premise

Customer-controlled context, audit, model-provider config, and TemporalStore durability for strict governance.

The contract should not change across deployment shapes. Only auth, network boundary, model provider, durability mode, observability sink, and data-residency policy change.

Minimal request envelopes

MatrixArk should always run its own extraction unless a trusted AI harness also sends a first-pass query plan or extracted event. Session ids are strongly recommended, but user-level scope can be used when session id is unavailable.

{
  "tenant_id": "company_a",
  "messages": [
    {"role": "user", "content": "Alice approved the GPU request up to $80k."}
  ],
  "scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
  "metadata": {"source": "cursor_hook", "event_time_ms": 1781500000000}
}

{
  "tenant_id": "company_a",
  "query": "Can we buy another GPU batch for Project 1?",
  "scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
  "max_context_tokens": 1800,
  "hints": {"retrieval_timeout_ms": 5000}
}

Prior context and confirmation policy

MatrixArk resolves prior context before every ingest extraction. This lets agent hooks send simple messages while MatrixArk decides whether a short reply is a confirmation, correction, new event, or noise.

1. Audit first

If context_pack_id or query_id is present, MatrixArk loads the prior ContextPackAudit and selected refs.

2. Summary next

If no audit exists, MatrixArk looks for the scoped node/session L0 summary when hints can resolve a node path.

3. Recent window

If no summary exists, MatrixArk fetches up to 8 recent same-session or same-user events, capped at 4 KB.

4. Replayable result

The event stores prior_context_source, prior_ref_count, and prior_refs so replay can explain the extraction decision.

Short text such as yes, correct, approved, or looks good becomes confirmation only when prior context exists. Without prior context, MatrixArk stores it as noise instead of inventing what the user confirmed.

Retrieval timeout budget

MATRIXARK_RETRIEVAL_TIMEOUT_MS is an end-to-end context retrieval budget. It includes MatrixArk query understanding, optional LLM extraction, query embedding, TemporalStore tree traversal, event/resource filtering, temporal compression lookup, and context-pack construction.

Default

MATRIXARK_RETRIEVAL_TIMEOUT_MS=5000. This is intentionally more generous because OSS model planning and embeddings can be slower than TemporalStore reads.

Traversal sub-budget

TemporalStore traversal should normally target hundreds of milliseconds. It receives only the remaining retrieval budget.

Per-request override

Use hints["retrieval_timeout_ms"] or hints["context_retrieval_timeout_ms"] for heavy workflows or very strict latency paths.

Provider timeouts

Production model providers should still enforce their own network and inference timeouts. The Python MVP checks the budget between pipeline stages.

Fallback rule: if the deadline is reached, MatrixArk returns a normal replayable ContextPack using only context already fetched. It never fabricates context to fill the prompt.

Fallback content order

When retrieval times out, the response remains safe and auditable. The fallback pack uses partial content in this order, only if that content was already fetched before the cutoff.

Priority	Returned content	Reason
1	Current `ContextEvent` records	Fresh timestamped facts are usually the most useful prompt evidence.
2	L1 summaries	Compact overviews help the agent continue with lower token cost.
3	Resource chunks	Exact source details are included only when already selected and token budget allows.
4	Compressed cold windows	Older history remains available as summary evidence without stuffing raw history.
5	Empty pack	If nothing safe was fetched, return no context instead of stale or unverified content.

Metrics and replay

MatrixArk emits dependency-free Prometheus text through service.prometheus_metrics(). The key histogram is matrixark_pipeline_stage_latency_ms with operation, stage, and status labels.

# Examples
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="query_understanding",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="tree_traversal",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="hard_timeout",status="timeout"} 1

Every returned ContextPack should be replayable. The audit record stores query plan, candidate nodes, selected refs, token count, timeout notes, fallback notes, and the decision trace used to explain why context entered or missed the final prompt.

Storage and durability modes

TemporalStore should default to async context serving for low latency, while supporting explicit sync durability for high-value enterprise facts. Customers can also choose shared-store mode or Raft mode.

Mode	Default use	Sync meaning
Shared-store mode	Serverless/cloud and lower operational burden	Wait for local durable or shared-store durable commit when requested.
Raft mode	On-prem, compliance, approvals, audit, and stricter replicated durability	Wait for quorum commit when `replicated_durable` is requested.

{
  "temporalstore": {
    "mode": "shared_store",
    "durability": "async_ack"
  }
}

{
  "temporalstore": {
    "mode": "raft",
    "durability": "replicated_durable",
    "raft": {"peers": ["ts-1:9010", "ts-2:9010", "ts-3:9010"]}
  }
}

Recommended production defaults

Use async ingestion for ordinary chat, tool, summary, and embedding events. Use sync or replicated durability for confirmed memory, approvals, policy facts, audit records, and compliance-sensitive events. Keep model/query-understanding timeout separate from the TemporalStore traversal target, and always inspect replay when debugging context quality.

Talk to MatrixArk

MatrixArk API reference for context serving.