MetaHarness
Self-learning, self-evolving agent runtime with pluggable model-serving. The infrastructure layer under dossbot.
MetaHarness is the runtime dossbot runs on. Pluggable model serving (OAI-compatible APIs for any model), a tool-calling loop that treats every tool as a durable ZFlow step, feedback capture on every action, and a self-tuning layer that adjusts routing, prompts, and tool definitions from observed performance. Models improve over time; so should the harness around them.
What makes it a harness, not just a wrapper
Tool = ZFlow step
Every tool the agent can call is registered as a durable ZFlow function. Retries are safe, outputs are journaled, long-running tools don't block the conversation.
OAI-compatible everywhere
All model backends — Anthropic, OpenAI, Gemini, self-hosted on Baseten/Together — speak one API surface. Routing policy decides which provider sees which request.
Feedback as first-class data
Every step records a reward signal: did the tool succeed, did the user accept the output, did a follow-up run correct it? These signals feed the self-tuning layer.
Budget-aware routing
Per-tenant policies constrain cost and latency. A cheap model for low-stakes queries, a frontier model for migrations, a local model for anything touching PHI.
Provider matrix
All providers expose an OAI-compatible API. MetaHarness picks the target per request based on task type, tenant budget, and latency target.
| Provider | Hosting | Strengths | Typical use |
|---|---|---|---|
| Anthropic Claude | API | Long context, tool use, writing | Default frontier agent |
| OpenAI | API | Structured output, function calling | Structured extraction |
| Google Gemini | API + Vertex | Multimodal, pricing | Doc / screenshot parsing |
| Baseten | Self-host on GCP | Custom fine-tunes | CENTARI serving |
| Vertex AI | GCP | Enterprise sovereignty | Regulated-data tenants |
| Together / Fireworks | Self-host | OSS models at cost | Bulk, low-stakes calls |
How the harness tunes itself
- 01
Collect
Every run emits a trace: model, tools, tokens, latency, cost, user feedback, downstream corrections.
- 02
Aggregate
Traces land in ClickHouse. Rollups by tool, model, task type, tenant, and time window give a signal about what's working.
- 03
Propose
A tuning agent suggests changes: demote a bad-performing tool description, reroute a task class to a different model, tighten a prompt that's producing inconsistent output.
- 04
Evaluate
Proposed changes run against an ENT-Bench-style replay harness on historical traces. Only wins with statistical headroom get promoted.
- 05
Promote
Canary a new config to a small tenant slice. Roll forward if metrics hold; roll back if they don't.
Stack
In the stack
CENTARI is served through MetaHarness like any other model — same routing, same budget policy, same feedback capture.
Tools pull grounded context from Tapestry before generating answers.
ENT-Bench is the offline eval the self-evolving loop uses to promote or reject config changes.
Every tool call runs as a durable ZFlow step.