Why
Agentic search breaks down into three problems:- Query planning. Given an input query, figure out what to search for and in what order.
- Iterative reasoning. Read the documents that come back, decide what to query next, and know when to stop.
- Context management. Keep the context window useful throughout steps 1 and 2 without it filling up with noise.
See how the same search plays out with a general-purpose model vs. a specialized one.
GPT-5
0 tokens$0.0000.0s
Charcoal
0 tokens$0.0000.0s
reasoning query()*illustrative example, not a benchmark
How it works
Task generation
We generate training scenarios from your data and existing search traces. Each scenario is a search task grounded in your actual corpus.Training
We train using CISPO. For each scenario, the model generates several search trajectories in parallel. A reward function scores each one, and the model updates toward higher-scoring approaches.Reward functions
A judge model scores each trajectory relative to its group across multiple dimensions:- Relevance: are the findings specific and useful?
- Search strategy: does the agent construct effective query plans and build on previous searches with diverse, complementary queries?
- Efficiency: how directly does the agent reach its final results, and does it avoid redundant search paths?
Base models
We train on open-weight models such as Qwen3-14B, Qwen3-30B, and gpt-oss-20B. The choice of base model depends on your latency and accuracy requirements. All models are served on dedicated GPU infrastructure.Continuous improvement
Once deployed, the model continues to improve:- Eval generation from production queries: we continuously generate new evaluation tasks from real search traffic hitting your namespace, ensuring the model is tested against the queries your users actually run.
- Ongoing checkpoint creation: new model checkpoints are trained and evaluated on a regular cadence. When a checkpoint outperforms the current deployment on held-out evals, it’s promoted to serve your namespace.
Getting started
RL training is a managed engagement. We work with you to:- Snapshot your corpus and generate training scenarios
- Run training and evaluate checkpoints against held-out tasks
- Deploy the trained model to serve your namespace