Saturday, June 13, 2026

Unlocking reliable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Experiments and outcomes

We evaluated agentic RAG on FramesQA, which relies on the FRAMES paper. An instance multi-hop query is:

“Of the highest two most watched tv season finales (as of June 2024), which finale ran the longest in size and by how a lot?”

The RAG system must carry out a number of steps to reach on the right reply. First, it has to establish that the 2 most watched finales are from the reveals M*A*S*H and Cheers. Then, it has to search out their working instances, and calculate the size distinction. In lots of RAG settings (Vanilla RAG or agentic RAG with out enough context), we may find yourself in a state of affairs the place the mannequin says one thing like:

“Regardless of a number of scans, I discovered no specific runtimes for M*A*S*H or Cheers. The paperwork present viewership knowledge, however not the period in minutes or hours.”

This doesn’t reply the query.

Luckily, our agentic RAG can resolve this by first trying to find the TV reveals, then utilizing the Question Rewriter and Enough Context Agent to have a focused seek for the run time of M*A*S*H or Cheers. Then, Gemini can simply decide which finale ran the longest in size and by how a lot:

“The M*A*S*H finale ran for 150 minutes, making it the longest of the highest two. It was 52 minutes longer than the Cheers finale, which ran for about 98 minutes.”

We ran an experiment to check this capability at scale (FramesQA has 824 queries together with a corpus containing 2,676 PDF paperwork). Within the “Vanilla” RAG setting, we use Google’s RAG Engine (which has a complicated retrieval engine, LLM parser, and re-ranker). We in contrast this with our agentic RAG in two settings. Within the single-corpus setting, we retrieve from the FramesQA paperwork. Within the cross-corpus setting, we additionally embody three different distracting datasets, the place the Planner Agent should decide the place to retrieve from. This cross-corpus setting mimics use instances the place corporations have databases managed by separate groups. We compute accuracy by utilizing an LLM-as-a-judge to check the system responses to the bottom fact solutions within the dataset.

Within the cross-corpus setting, our system almost matches its single-corpus accuracy. Even when the Planner Agent should choose the right corpus out of 4 prospects, we efficiently route the search queries and reply 90.1% of questions appropriately. Additionally, the latency of each single- and cross-corpus variations is about the identical (inside 3% on common). This demonstrates that our Agentic RAG system can motive over a number of, unrelated knowledge sources, which opens up prospects for extra versatile retrieval eventualities.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles