LLM context engineering
Use the full stack when LLM memory becomes production state.
Choose this path when context is not just memory anymore. TemporalStore owns time and low-latency context reads; MatrixDB adds serverless hot state; MatrixKV protects permissions, approvals, leases, and committed agent truth.
In simple words
In one sentence: this page explains how we keep LLM context reliable in production. We store time, meaning, and trust signals so each request can use only the context that is still valid now.
What to remember
Context is more than a search result. It is timeline, freshness, permissions, and what already happened.
How to use it
Build your product flow first, then let MatrixArk assemble prompt-ready context, route fast paths, and keep stale context out.
What you get
Fewer wrong-time answers, better cost control, and cleaner reuse of stable prompt parts.
Why the full stack matters
The full MatrixArk stack turns context from an application-side integration problem into a production state platform. TemporalStore answers time and speed. MatrixDB gives serverless hot state and Redis-compatible adoption. MatrixKV protects truth, ownership, approvals, leases, and committed actions. Together they give Cursor-like vertical builders and enterprise AI teams one context surface instead of a pile of fragile glue.
What changes from the one-store path
Open-source TemporalStore is the clean starting point for timelines, replay, freshness, low-latency reads, and prompt-ready memory. The full stack is for production platforms that also need serverless hot state, Redis-compatible integration, permissions, approvals, leases, committed truth, deployment, and operational boundaries.
| Layer | Open-source TemporalStore alone | Full MatrixArk stack |
|---|---|---|
| Temporal context | Core product: timelines, replay, freshness, counters, sequences. | Still the primary engine, with managed operations and routing. |
| Hot current state | Possible when small or time-oriented. | MatrixDB handles serverless profiles, sessions, cache metadata, Redis-compatible access, scans, and exports. |
| Committed truth | Can be logged as events for audit. | MatrixKV handles permissions, document versions, approvals, leases, ownership, and committed actions. |
| Customer promise | One open-source Rust store for LLM context and memory. | One platform surface for context, memory, hot state, truth, runtime reuse signals, and production boundaries. |
The product thesis
Cursor works because it understands a developer's project context. Every vertical needs the same idea for its own operational world: support tickets, legal matters, incidents, sales accounts, insurance claims, compliance evidence, patient administration, field service, and finance workflows.
What the AI harness owns vs what MatrixArk owns
Cursor-like product teams and enterprise AI teams should keep owning the user experience, local context, model choice, agent workflow, and final prompt style. MatrixArk should own the infrastructure decisions that are easy to get wrong at scale: what context is fresh, what is stale, which store to query, which sections fit the token budget, and what should be written back after the LLM answer.
| Layer | AI harness owns | MatrixArk owns |
|---|---|---|
| User query | Raw request, UI state, selected entity, optional first-pass intent plan. | Validation, schema mapping, safe query plan, token budget, and fallback route. |
| Local context | Open files, visible page, selected ticket, current draft, active tool state. | Durable cross-session memory, time validity, stale blocking, replay, and source freshness. |
| Retrieval | Domain preferences and UX-specific ranking signals. | VectorDB/S3 coordination, TemporalStore freshness, MatrixKV permissions, MatrixDB hot state. |
| Write-back | User acceptance, tool outcomes, corrections, final answer, new local state. | Memory updates, commitments, rejected suggestions, context-pack replay ids, cache invalidation hints. |
The production API surface
The full stack should feel small to customers. A vertical AI harness calls a handful of APIs while MatrixArk hides extraction, schema compilation, storage routing, time filtering, token budgets, and replay.
| API | Purpose | Primary backing layer |
|---|---|---|
/context/ingest | Write messages, tool outputs, approvals, costs, docs, confirmations, and final answers. | TemporalStore first; VectorDB/S3 for chunks and objects. |
/context/retrieve | Turn raw query plus hints into a token-budgeted context pack. | TemporalStore for time-aware context; MatrixDB/MatrixKV as needed. |
/context/feedback | Record accepted answers, rejected context, user corrections, and commitments. | TemporalStore audit/events; MatrixKV for committed truth. |
/context/audit | Explain selected refs, blocked refs, freshness checks, and token budget use. | TemporalStore ContextAudit. |
/context/replay | Rebuild what the model saw at a historical request time. | TemporalStore timelines plus source refs. |
How MatrixArk helps KV-cache and LMCache
LMCache-style systems and remote KV-cache services are model-runtime infrastructure. They help reuse cached prefixes, attention KV state, and repeated prompt segments. MatrixArk is the application-state layer beside that runtime: it decides which fresh context should be assembled, which facts are trusted, which memories are stale, and which actions committed. That makes cache reuse safer because the application can mark which context packs are stable, which source objects changed, which sections are reusable, and which memories must be refreshed.
select state and assemble request MatrixArk context state
time, hot state, transactional truth VectorDB + S3
semantic recall and raw objects
prefix and KV-cache reuse LLM runtime
vLLM, SGLang, TensorRT-LLM style serving Response and tool events
write back memory and commits
Why existing solutions do not satisfy production customers
Existing LLM context tools solve important slices, but vertical customers need the whole context decision path. Vector DBs retrieve chunks. Prompt tools manage instructions. Observability tools record traces. Caches reduce latency. Feature stores organize offline feature data. None of those layers alone owns time-aware memory, permissions, source validity, open commitments, prompt replay, and committed agent actions together.
| Existing layer | What it solves | What customers still need |
|---|---|---|
| VectorDB | Semantic recall over embeddings | Freshness, authority, temporal validity, permissions, and replayable context packs. |
| Prompt management | Templates, versions, eval cases | Live request-time context assembly from governed state, not just better instructions. |
| LangGraph / LlamaIndex | Agent orchestration, retrieval workflows, memory abstractions | Production temporal storage, freshness, replay, cache-control signals, and storage boundaries below the framework. |
| Mem0 / Letta-style memory | Personalized or stateful agent memory | Consolidated infrastructure for temporal serving, hot current state, canonical truth, multi-tenant operations, and auditability. |
| Runtime KV-cache | Prefix and attention-state reuse | Application memory, source selection, workflow state, cache eligibility, and durable audit trails. |
| Redis / app DB glue | Fast key-value state and custom logic | Redis-compatible adoption and integration through MatrixDB, including LMCache metadata and cache-control keys, while MatrixArk hides the distributed serving, storage, and placement model. |
| Feature store | Feature registry, training sets, materialization | LLM-specific context packs with memories, tools, citations, permissions, and token budgets. |
| LLM observability | Traces, latency, cost, evals, and post-hoc debugging | Pre-call context governance: what enters the prompt, why it is fresh, what is blocked, and how it can be replayed. |
| DynamoDB / cloud KV | Managed application key-value state | LLM-specific split between temporal memory, Redis-compatible hot state, and strongly consistent agent truth. |
What we adopt from Zep, Graphiti, and Mem0
Zep/Graphiti and Mem0 are useful signals for the category. They prove that applications are willing to call a context layer for memory ingestion and search when the API is simple. MatrixArk should adopt that low-friction developer experience while differentiating on TemporalStore-backed serving, enterprise deployment, and context-pack governance.
| Market pattern | Why it matters | MatrixArk version |
|---|---|---|
| Zep-style business data ingestion | Apps can send messages, text, and JSON business events without building memory pipelines themselves. | MatrixArk accepts raw events, final answers, tool outputs, docs, and lightweight hints, then owns extraction and TemporalStore indexing. |
| Graphiti-style temporal facts | Entities, relationships, source episodes, and time validity help prevent stale or contradictory memory. | TemporalStore becomes the default time-aware serving layer; a graph layer can remain optional for customers needing multi-hop relationship reasoning. |
| Mem0-style add/search simplicity | Developers like a small API that extracts memories and retrieves relevant context without a custom RAG stack. | MatrixArk exposes ingest_context_event() and get_context_pack(), with token budgets, time windows, replay ids, and policy results built in. |
| Hybrid retrieval | Semantic vectors alone are not enough; keyword, metadata, entity, and recency signals improve retrieval. | MatrixArk coordinates VectorDB, S3, TemporalStore, MatrixDB, and MatrixKV, then returns one prompt-ready context pack. |
| Online plus batch ingestion | Customers need to bootstrap history and keep live workflows fresh. | Support batch backfill for Markdown, PDFs, tickets, docs, and logs, plus live hooks before the LLM call and after the model answer. |
What VikingMem and OpenViking validate
VikingMem validates the database shape for LLM memory: extract important events, update entity state, compress old timelines, and retrieve memories with time and business importance. OpenViking validates the user experience: agents like hierarchy, L0 abstracts, optional overviews, and detail loading on demand.
| System direction | What to learn | MatrixArk architecture |
|---|---|---|
| VikingMem | Event memory, entity memory, temporal compression, time-weighted recall. | Model events, state, summaries, freshness rules, and ContextPacks on TemporalStore without exposing paper-style operator names to customers. |
| OpenViking | Path hierarchy, L0/L1/L2 context loading, semantic navigation. | Keep the intuitive graph/tree hierarchy, but compile it into TemporalStore prefix hashes, filter-first traversal, optional VectorDB recall, and selected L2 evidence chunks. |
| ByteRover | Hierarchical memory tree, provenance, lifecycle, recency, file-like agent memory, and simple memory intents. | Keep hierarchy as metadata, but serve hot requests through compiled scope hashes, validity windows, and TemporalStore serving functions. |
| memU | Agent memory filesystem over chats, docs, files, tool logs, and multimodal inputs. | Support batch and real-time ingestion while object storage holds raw artifacts and TemporalStore holds replayable metadata. |
The product stance: memory systems prove the database need; hierarchy-first systems prove the UX. MatrixArk combines both with a temporal serving model that can meet enterprise latency, audit, token-budget, and on-prem requirements.
MatrixArk plans each query, then reuses the plan after the answer
The first step is not retrieval. MatrixArk creates a ContextPlan from the raw query, tenant hints, session state, and optional harness-provided understanding. The plan decides intent, scope, time window, token budget, retrieval routes, and post-answer extraction targets. Cursor-like products can either send only the raw query or send their own first LLM understanding; MatrixArk validates both paths against tenant policy before querying TemporalStore.
Before model call
ContextPlan becomes TemporalStore queries, optional vector searches, and a token-budgeted ContextPack.
Inside prompt
The harness combines MatrixArk context with local files, current UI state, tool instructions, and the user query.
After model call
The same plan guides ingestion of final answers, tool calls, new commitments, user corrections, and failed steps.
For follow-ups
Stored plans and ContextPacks make follow-up questions, replay, evals, and cache invalidation easier.
L0 plus selected L2 chunks beats mandatory L1
OpenViking-style L0/L1/L2 loading is useful, but MatrixArk should not require an L1 overview for every node. L0 is mandatory for navigation. L2 is mandatory as source-level evidence, but the prompt usually receives only selected chunks, not the full object. L1 should be generated only for large, ambiguous, or frequently selected branches.
| Layer | MatrixArk policy | Where it lives |
|---|---|---|
| L0 | Short summary for branch scoring and category navigation. | TemporalStore node metadata, optionally embedded directly for exact similarity. |
| L1 | Optional overview when a branch is broad, high-volume, or ambiguous. | TemporalStore summary record or object-store ref. |
| L2 | Raw source and chunked evidence; only selected chunks enter the prompt. | S3/object store plus optional VectorDB chunk embeddings. |
Why TemporalStore first instead of graph first
Graph memory is valuable when the main problem is relationship reasoning. But many production context questions are time/scope/filter questions: what changed, what is current, what is still open, what was already tried, and what fits the token budget now. Those questions need a fast temporal serving engine before they need a graph database.
Use TemporalStore first
Project timelines, approvals, cost events, support promises, incident steps, source versions, prompt replay, stale blocking, and account roaming.
Add graph later
Complex relationship traversal, alias/entity resolution across many systems, dependency chains, influence maps, and ontology-heavy reasoning.
Keep vector DB optional
Use semantic retrieval for open-ended cross-tree search and massive L2 chunks, but allow TemporalStore-only traversal when layers are limited and leaf timelines are filtered by time, status, event type, and limit.
Keep object store explicit
Store raw PDFs, Markdown, transcripts, screenshots, and large artifacts in S3-compatible storage while TemporalStore stores serving metadata and summaries.
This is the product stance: memory APIs taught the market how easy context ingestion should feel; TemporalStore lets MatrixArk make that context fast, replayable, budgeted, and safe enough for enterprise serving.
Why three products win in production
The hard part is not storing one memory. TemporalStore alone can already handle many memory and replay workloads. The harder production problem is deciding, at request time, which memories are fresh, which sources changed, which prompt sections can be reused, which facts are canonical, which actions committed, and which tenant policy applies. MatrixArk keeps those concerns in one infrastructure model instead of scattering them across framework code, cache keys, vector metadata, observability logs, and service databases.
TemporalStore
Owns timelines, memory deltas, tool history, freshness, replay, counters, sequences, and cache eligibility.
MatrixDB
Owns Redis-compatible hot session summaries, active profiles, cached retrieval results, TTL state, context-pack metadata, LMCache metadata, eligibility keys, and invalidation hints.
MatrixKV
Owns canonical facts, permissions, document versions, approvals, leases, checkpoints, and committed actions.
One platform
One place for SDKs, deployment, observability, recovery, cache policy, tenant policy, and prompt-context operations.
Context is more than vector search
A vector database can find semantically similar chunks. It cannot, by itself, decide which fact is current, which promise is still open, what the agent already tried, what the user is allowed to see, or what context was valid at a previous point in time.
task, user, entity, time Context manager
retrieve, filter, rank, compress Prompt builder
assemble trusted context pack
semantic chunks and embeddings S3 / object store
documents, audio, images, transcripts MatrixArk engines
time, hot state, transactional truth
timelines, events, sequences, counters MatrixDB
hot sessions, profiles, summaries, cache MatrixKV
permissions, versions, locks, commits
The output is not a raw search result. It is a context pack: latest facts, relevant timeline, retrieved sources, permissions, stale-memory warnings, and citations.
Workflow with external VectorDB and S3
VectorDB and S3 are not competitors to MatrixArk. They are external retrieval and object layers. VectorDB finds semantically similar chunks. S3 stores the canonical source objects. MatrixArk then reads time, truth, and hot state before the prompt is assembled, deciding which retrieved chunks, raw objects, memories, permissions, and time-valid facts should enter the prompt now.
task, entity, tenant, time Context orchestrator
plan retrieval and state reads Context pack builder
rank, filter, compress, cite
what happened, changed, failed, stayed open VectorDB
semantic candidates and chunk ids S3 / object store
full docs, PDFs, transcripts, media
permissions, versions, approvals, commits MatrixDB
serverless hot state and cache metadata LLM runtime
trusted prompt, tool call, answer
| Layer | Question it answers | Why time-aware context matters |
|---|---|---|
| VectorDB | Which chunks are semantically similar? | Similarity is not enough; a similar chunk may be stale, unauthorized, or superseded. |
| S3 / object store | Where is the full source object? | The prompt may need the approved version, exact citation, transcript, or original file. |
| TemporalStore | What happened over time? | It adds recent actions, open commitments, failed attempts, freshness, replay, and stale-memory blocking. |
| MatrixKV | What is approved and committed? | It prevents the model from mixing old drafts, revoked permissions, and uncommitted actions. |
| MatrixDB | What hot state should be fetched fast? | It keeps profiles, session summaries, retrieval cache, and LMCache metadata close to the request path. |
Storage responsibilities
Temporal context
MatrixArk can route session timelines, customer events, tool calls, memory diffs, prompt replay data, open promises, behavior sequences, and windowed counters to TemporalStore.
Hot state
MatrixArk can route hot profiles, active session summaries, cached retrieval results, ranked context lists, TTL memories, LMCache metadata, cache eligibility, invalidation hints, and Redis-compatible operational state to MatrixDB.
Committed truth
MatrixArk can route canonical facts, document versions, permissions, workflow checkpoints, locks, routing decisions, and committed agent actions to MatrixKV.
VectorDB + S3
VectorDB owns semantic search over embeddings and chunk ids. S3-compatible object storage owns raw documents, transcripts, prompts, responses, media, and large blobs. MatrixArk governs which of those candidates are fresh, permissioned, time-valid, and worth spending tokens on.
Online prompt assembly
The application should call a context API, not hand-build prompts from random cache keys and vector hits. The context API pulls the right source for each kind of information.
task and entity MatrixKV
ACLs and truth TemporalStore
timeline and recent actions VectorDB
semantic candidates S3
source objects MatrixDB
hot summaries and cache Prompt builder
rank within token budget LLM
answer or tool call
get_context_pack( vertical = "support", task = "draft_customer_reply", user_id = "agent_17", entity_id = "customer_acme", as_of_time = "now", token_budget = 6000 ) returns: latest_facts relevant_timeline open_commitments retrieved_sources source_objects blocked_context stale_memories permissions cache_policy prompt_sections
Specific context engineering upgrades
TemporalStore changes the inputs the prompt builder can rely on. The prompt can include
compact sections such as open_commitments, already_tried,
valid_sources_as_of, freshness_warnings, and
cache_policy instead of asking the model to infer those facts from a pile of text.
Before: support prompt
Use the latest ticket and retrieved docs to answer politely. Risk: repeats failed steps and forgets an unresolved refund promise.
After: support prompt
Use the account timeline, open promise, failed-action list, current entitlement, and policy-as-of-now before drafting the reply.
Before: policy prompt
Answer from relevant documents. Risk: old policy chunks and unapproved drafts can enter the same prompt.
After: policy prompt
Answer only from approved sources valid at the requested time, then call out newer conflicting drafts as separate context.
Before: cache prompt
Reuse a long prompt prefix when it looks similar. Risk: stale memory and permission-sensitive details get cached together.
After: cache prompt
Reuse stable prompt sections while TemporalStore invalidates volatile timeline, permission, and source-version sections.
What breaks without this layer
| Problem | Common workaround | MatrixArk answer |
|---|---|---|
| Prompt context gets stale | More vector filters and application logic | TemporalStore serves recent timelines, freshness, counters, and filters online. |
| Agent memory is hard to debug | Logs plus ad hoc replay scripts | TemporalStore keeps ordered tool events and decisions as queryable sequences. |
| Profiles and tenant state sprawl | Redis keys, service databases, and one-off caches | MatrixDB gives durable multi-tenant KV serving behind familiar Redis-compatible APIs, so apps stay agnostic to placement and storage internals. |
| Workflow actions need correctness | Flags in cache or best-effort service state | MatrixKV stores permissions, routing, leases, and committed actions in a transactional KV database. |
| Model cache is confused with app state | Remote KV-cache becomes the catch-all | LMCache handles model-runtime reuse; MatrixArk handles application state, cache eligibility, source freshness, and context control. |
Strong LLM use cases
Support memory
Customer timeline, open promises, escalations, account facts, prior replies, and current issue state before every response.
Agent time travel
Replay the exact context, memories, tool outputs, permissions, and prompt sections used before a bad answer.
Prompt replay and evals
Test new prompts or models against historical context packs instead of synthetic examples only.
Policy-time RAG
Answer using only documents, permissions, and facts valid at the requested time.
Multi-agent handoff
Store what each agent did, what failed, what remains open, and what assumptions should carry forward.
Memory governance
Detect stale, conflicting, unauthorized, low-confidence, or superseded memories before they enter the prompt.
Security operations
Alert timelines, identity events, asset state, analyst actions, containment steps, and prior incident memory.
Insurance claims
Claim chronology, policy version, adjuster notes, missing evidence, documents received, and coverage state.
Target production teams, not generic chatbot users
The strongest customer is not an individual consumer looking for another chatbot. It is a Cursor-like vertical AI company, enterprise AI team, platform team, or SaaS product team building an AI workspace for a high-value workflow where stale or unauthorized context creates real business risk. End users and developers can still play locally; production value appears when a team owns the domain workflow and reliability bar.
Support platforms
Ticket timelines, account facts, entitlements, refunds, open promises, escalation state, and policy-time answers.
Legal and compliance
Matter history, contract versions, evidence timelines, citations, approvals, and permission-aware drafting context.
Security operations
Incident timelines, alert sequences, analyst actions, asset context, policy versions, and post-incident replay.
Insurance and healthcare ops
Claim or admin timelines, documents, coverage or benefit facts, approvals, stale-context protection, and audit trails.
Cursor-style context for every vertical
The application surface can look like a vertical Cursor or an enterprise Cursor-style workspace: an AI surface that edits, drafts, investigates, answers, and takes action inside a domain. MatrixArk owns the context substrate underneath that workspace.
Generic chatbot
Vertical copilot
Why this can be a company
Existing LLM tools often cover one slice: vector retrieval, agent memory, prompt testing, tracing, object storage, or model-runtime caching. TemporalStore is the standalone starting point for time-aware prompt context. The full MatrixArk stack adds hot state and trusted correctness when production copilots need permissions, current facts, replay, and vertical-specific context rules.
MatrixArk should not be positioned as another vector database. The stronger position is the context state layer for production LLM agents: the infrastructure that decides what the model should know, trust, ignore, cite, remember, and forget.