Product Manual
MatrixArk API reference for context serving.
MatrixArk gives AI harnesses one product surface for context ingestion, retrieval, feedback, replay, and operations. The caller can stay simple: send raw messages, resources, tool results, final answers, lightweight scope hints, and token budgets. MatrixArk handles extraction, TemporalStore writes, tree traversal, freshness checks, packing, metrics, and replay.
Public API surface
Customers should not model TemporalStore schemas directly in v1. MatrixArk owns extraction, canonicalization, indexes, summaries, embeddings, timeout handling, and context-pack construction.
| API | Required input | What MatrixArk does |
|---|---|---|
ingest | tenant_id, text, optional hints | Extracts event fields, creates or reuses context nodes, writes events, indexes, summaries, and embeddings. |
batch_ingest | List of ingest items | Runs the same idempotent write path for bulk signals, resources, tool outputs, or migrated history. |
stream_ingest | Stream id, offset, payload | Accepts ordered agent or workflow streams while avoiding replayed offsets. |
ingest_resource | tenant_id, raw URI, optional hints | Parses Markdown, TXT, or PDF, stores chunks and source refs, and writes L0/L1 summaries plus embeddings. |
retrieve | tenant_id, raw query, max context tokens | Plans the query, embeds it, traverses TemporalStore, filters by time and metadata, and returns a token-budgeted ContextPack. |
feedback | Context pack id or query id, final answer, accepted/rejected refs | Stores accepted memory, corrections, rejected refs, and confirmation signals for future retrieval. |
audit / replay | Context pack id or query id | Returns the plan, selected refs, dropped refs, timeout notes, token counts, and decision trace. |
Deployment modes: hook or standalone, cloud or on-premise
MatrixArk can run as an invisible hook inside an AI agent loop or as a standalone context service that any application calls directly. Both modes use the same APIs, ContextPack audit format, TemporalStore-backed memory, summaries, embeddings, and retrieval pipeline.
| Mode | What the customer integrates | Best fit |
|---|---|---|
| Hook mode | Before-LLM query hook, optional tool/resource hooks, after-LLM feedback hook, and confirmation hook. | Cursor-like products, enterprise assistants, IDE agents, workflow copilots, and vertical AI harnesses. |
| Standalone mode | Direct calls to ingest, batch ingest, stream ingest, resource ingest, retrieve, feedback, audit, and replay. | Enterprises that want one central context infrastructure layer across many agents and apps. |
Hook + cloud
Fastest integration for AI harness vendors. Hooks call a managed MatrixArk endpoint while the agent keeps its own local context.
Hook + on-premise
Best for sensitive enterprise agents. Hooks call MatrixArk inside the customer VPC/VNET or data center.
Standalone + cloud
One managed context API for many apps when cloud data residency and governance are acceptable.
Standalone + on-premise
Customer-controlled context, audit, model-provider config, and TemporalStore durability for strict governance.
Minimal request envelopes
MatrixArk should always run its own extraction unless a trusted AI harness also sends a first-pass query plan or extracted event. Session ids are strongly recommended, but user-level scope can be used when session id is unavailable.
{
"tenant_id": "company_a",
"messages": [
{"role": "user", "content": "Alice approved the GPU request up to $80k."}
],
"scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
"metadata": {"source": "cursor_hook", "event_time_ms": 1781500000000}
}
{
"tenant_id": "company_a",
"query": "Can we buy another GPU batch for Project 1?",
"scope": {"user_id": "u_123", "session_id": "s_456", "team": "infra", "project": "project_1"},
"max_context_tokens": 1800,
"hints": {"retrieval_timeout_ms": 5000}
}
Prior context and confirmation policy
MatrixArk resolves prior context before every ingest extraction. This lets agent hooks send simple messages while MatrixArk decides whether a short reply is a confirmation, correction, new event, or noise.
1. Audit first
If context_pack_id or query_id is present, MatrixArk loads the prior ContextPackAudit and selected refs.
2. Summary next
If no audit exists, MatrixArk looks for the scoped node/session L0 summary when hints can resolve a node path.
3. Recent window
If no summary exists, MatrixArk fetches up to 8 recent same-session or same-user events, capped at 4 KB.
4. Replayable result
The event stores prior_context_source, prior_ref_count, and prior_refs so replay can explain the extraction decision.
yes, correct, approved, or looks good
becomes confirmation only when prior context exists. Without prior context, MatrixArk
stores it as noise instead of inventing what the user confirmed.
Retrieval timeout budget
MATRIXARK_RETRIEVAL_TIMEOUT_MS is an end-to-end context retrieval budget. It
includes MatrixArk query understanding, optional LLM extraction, query embedding,
TemporalStore tree traversal, event/resource filtering, temporal compression lookup, and
context-pack construction.
Default
MATRIXARK_RETRIEVAL_TIMEOUT_MS=5000. This is intentionally more generous because OSS model planning and embeddings can be slower than TemporalStore reads.
Traversal sub-budget
TemporalStore traversal should normally target hundreds of milliseconds. It receives only the remaining retrieval budget.
Per-request override
Use hints["retrieval_timeout_ms"] or hints["context_retrieval_timeout_ms"] for heavy workflows or very strict latency paths.
Provider timeouts
Production model providers should still enforce their own network and inference timeouts. The Python MVP checks the budget between pipeline stages.
Fallback content order
When retrieval times out, the response remains safe and auditable. The fallback pack uses partial content in this order, only if that content was already fetched before the cutoff.
| Priority | Returned content | Reason |
|---|---|---|
| 1 | Current ContextEvent records | Fresh timestamped facts are usually the most useful prompt evidence. |
| 2 | L1 summaries | Compact overviews help the agent continue with lower token cost. |
| 3 | Resource chunks | Exact source details are included only when already selected and token budget allows. |
| 4 | Compressed cold windows | Older history remains available as summary evidence without stuffing raw history. |
| 5 | Empty pack | If nothing safe was fetched, return no context instead of stale or unverified content. |
Metrics and replay
MatrixArk emits dependency-free Prometheus text through service.prometheus_metrics().
The key histogram is matrixark_pipeline_stage_latency_ms with
operation, stage, and status labels.
# Examples
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="query_understanding",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="tree_traversal",status="ok"} 12
matrixark_pipeline_stage_latency_ms_count{operation="retrieve",stage="hard_timeout",status="timeout"} 1
Every returned ContextPack should be replayable. The audit record stores query plan, candidate nodes, selected refs, token count, timeout notes, fallback notes, and the decision trace used to explain why context entered or missed the final prompt.
Storage and durability modes
TemporalStore should default to async context serving for low latency, while supporting explicit sync durability for high-value enterprise facts. Customers can also choose shared-store mode or Raft mode.
| Mode | Default use | Sync meaning |
|---|---|---|
| Shared-store mode | Serverless/cloud and lower operational burden | Wait for local durable or shared-store durable commit when requested. |
| Raft mode | On-prem, compliance, approvals, audit, and stricter replicated durability | Wait for quorum commit when replicated_durable is requested. |
{
"temporalstore": {
"mode": "shared_store",
"durability": "async_ack"
}
}
{
"temporalstore": {
"mode": "raft",
"durability": "replicated_durable",
"raft": {"peers": ["ts-1:9010", "ts-2:9010", "ts-3:9010"]}
}
}
Recommended production defaults
Use async ingestion for ordinary chat, tool, summary, and embedding events. Use sync or replicated durability for confirmed memory, approvals, policy facts, audit records, and compliance-sensitive events. Keep model/query-understanding timeout separate from the TemporalStore traversal target, and always inspect replay when debugging context quality.
Talk to MatrixArk