research-agent-optimization
// Optimize the research agent for rate limit handling, API call efficiency, web search integration fixes, and improved streaming UX with granular progress updates and source attribution.
$ git log --oneline --stat
stars:194
forks:37
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
nameresearch-agent-optimization
descriptionOptimize the research agent for rate limit handling, API call efficiency, web search integration fixes, and improved streaming UX with granular progress updates and source attribution.
Research Agent Optimization
Scope
- Project root:
/home/bender/classwork/Thesis - Backend:
backend/news_research_agent.py,backend/app/api/routes/research.py,backend/app/services/news_research.py - Frontend:
frontend/app/search/page.tsx,frontend/lib/api.ts - Configuration:
backend/app/core/config.py
Problem Statement
- Rate Limiting: Gemini API hits 429 quota exceeded errors during research and article analysis
- Web Search: DuckDuckGo tool integration has naming issues (not properly initialized)
- Unclear Progress: Research streaming shows generic "Still working..." instead of specific tool calls
- JSON in Response: Results show raw JSON blocks instead of formatted source cards
- Redundant API Calls: Multiple internal search calls without caching/deduplication
Required Outcomes
- Graceful rate limit handling with exponential backoff and quota monitoring
- Working web search tool with proper DuckDuckGo initialization
- Verbose streaming events showing real tool execution (web_search, news_search, internal_news_search)
- Research results rendered with inline source cards (not JSON blocks)
- Optimized API calls: batch searches, cache semantic results, reuse internal knowledge base
- Clear error messages when quota is exceeded
Workflow
1. API Call Optimization
- Implement request batching in
search_internal_newstool - Add caching layer for semantic search results (avoid duplicate queries within 5min window)
- Combine web_search + news_search into single result set
- Track API call counts per session and warn before quota exhaustion
- Add exponential backoff retry logic (1s, 2s, 4s, 8s max)
Files:
backend/news_research_agent.py- tools and cachingbackend/app/services/news_research.py- request batching helpers
2. Rate Limit & Quota Handling
- Add try/catch wrapper around Gemini calls
- Detect 429 errors and return user-friendly message ("API Rate Limit: ...please wait a moment...")
- Add optional
--skip-gemini-analysismode for article analysis when quota is low - Log quota usage and remaining tokens
- Set model to
gemini-2.0-flash(faster, lower token cost) instead ofgemini-2.0-flash-exp
Files:
backend/app/core/config.py- error handling wrapper, model selectionbackend/app/api/routes/research.py- HTTP error responsesbackend/news_research_agent.py- LLM call error handling
3. Web Search Tool Fix
- Verify DuckDuckGo import:
from duckduckgo_search import DDGS(notddgsorDuckDuckGo) - Ensure
web_searchandnews_searchtools are properly bound to LLM - Add fallback to internal search if web search fails
- Log tool execution with query and result count
Files:
backend/news_research_agent.py- tool definitions and error handling- Use
exa-codeto verify current DuckDuckGo API patterns
4. Streaming Progress Clarity
- Expand SSE event types:
tool_startincludes tool name + query parameters - Map tool events to user-friendly messages:
web_search("climate change")→ "Searching web for: climate change..."news_search(keywords="COP30")→ "Searching news for: COP30..."search_internal_news(query)→ "Searching internal knowledge base..."fetch_article_content(url)→ "Reading article: [title/domain]..."
- Add timestamps and tool execution duration
- Emit status updates every 3-5 seconds if no tool activity
Files:
backend/news_research_agent.py- streaming generatorbackend/app/api/routes/research.py- SSE formatting
5. Frontend Result Rendering
- Remove JSON blocks from response text
- Render referenced articles in a "Sources" section below the answer
- Use article cards: title, source, date, image thumbnail
- Make cards clickable to open article detail modal
- Group sources by retrieval method (semantic, web search, internal)
Files:
frontend/app/search/page.tsx- message rendering and sources gridfrontend/lib/api.ts- response parsing
6. Error Handling & User Feedback
- Detect and handle:
- 429 quota exceeded → "API Rate Limit: The AI service has reached its rate limit. Please wait a moment and try again."
- Connection timeout → "Request Timeout: The research took too long. Try a simpler query."
- Tool execution failure → "Tool [name] failed: [reason]. Continuing with alternative search..."
- Add retry prompt on error (not automatic, user chooses)
- Log all errors with request ID for debugging
Files:
backend/app/api/routes/research.py- error formattingfrontend/app/search/page.tsx- error UI and retry logic
Checks
API Optimization
- Verify semantic search results are cached (no duplicate calls)
- Check web_search and news_search return results (not empty)
- Confirm tool execution logs show cache hits for repeated queries
Rate Limit Handling
- Trigger 429 error and verify graceful fallback message displays
- Confirm no stack traces shown to user
- Check logs show quota status and retry timing
Web Search
- Query "climate change" and verify web_search returns 5+ results
- Confirm DuckDuckGo DDGS class is properly instantiated
- Check news_search returns recent news articles
Streaming Clarity
- Monitor SSE events for tool_start with query details
- Verify timestamps increment correctly
- Confirm "Still working..." message only shows after 30s inactivity
Frontend Rendering
- Verify research answer is plain text (no JSON)
- Check "Sources" section appears with article cards
- Confirm card click opens article detail modal
- Verify no duplicate sources (de-duplication working)
Error Scenarios
- Submit invalid query and verify doesn't crash
- Test with network disconnect and check timeout message
- Simulate quota exceeded (403) and verify user sees rate limit message
Implementation Checklist
- Add retry decorator with exponential backoff to Gemini client
- Implement request cache in
search_internal_newswith 5min TTL - Fix DuckDuckGo tool initialization (verify DDGS import)
- Update
research_stream()to emit granular tool start/result events - Map tool events to human-readable status messages in API endpoint
- Remove JSON block from final answer text
- Add "Sources" section with article cards to frontend
- Update error handling for 429 quota exceeded
- Add streaming status animation to UI
- Write tests for quota handling and web search integration