Comprehensive web research powered by Firecrawl and LangGraph
- Firecrawl: Multi-source web content extraction
- OpenAI GPT-4o: Search planning and follow-up generation
- Next.js 15: Modern React framework with App Router
Service | Purpose | Get Key |
---|---|---|
Firecrawl | Web scraping and content extraction | firecrawl.dev/app/api-keys |
OpenAI | Search planning and summarization | platform.openai.com/api-keys |
- Clone this repository
- Create a
.env.local
file with your API keys:FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key
- Install dependencies:
npm install
oryarn install
- Run the development server:
npm run dev
oryarn dev
flowchart TB
Query["'Compare Samsung Galaxy S25<br/>and iPhone 16'"]:::query
Query --> Break
Break["π Break into Sub-Questions"]:::primary
subgraph SubQ["π Search Queries"]
S1["iPhone 16 Pro specs features"]:::search
S2["Samsung Galaxy S25 Ultra specs"]:::search
S3["iPhone 16 vs Galaxy S25 comparison"]:::search
end
Break --> SubQ
subgraph FC["π₯ Firecrawl API Calls"]
FC1["Firecrawl /search API<br/>Query 1"]:::firecrawl
FC2["Firecrawl /search API<br/>Query 2"]:::firecrawl
FC3["Firecrawl /search API<br/>Query 3"]:::firecrawl
end
S1 --> FC1
S2 --> FC2
S3 --> FC3
subgraph Sources["π Sources Found"]
R1["Apple.com β<br/>The Verge β<br/>CNET β"]:::source
R2["GSMArena β<br/>TechRadar β<br/>Samsung.com β"]:::source
R3["AndroidAuth β<br/>TomsGuide β"]:::source
end
FC1 --> R1
FC2 --> R2
FC3 --> R3
subgraph Valid["β
Answer Validation"]
V1["iPhone 16 specs β (0.95)"]:::good
V2["S25 specs β (0.9)"]:::good
V3["S25 price β (0.3)"]:::bad
end
Sources --> Valid
Valid --> Retry
Retry{"Need info:<br/>S25 pricing?"}:::check
subgraph Strat["π§ Alternative Strategy"]
Original["Original: 'Galaxy S25 price'<br/>β No specific pricing found"]:::bad
NewTerms["Try: 'Galaxy S25 MSRP cost'<br/>'Samsung S25 pricing leak'<br/>'S25 vs S24 price comparison'"]:::strategy
end
Retry -->|Yes| Strat
subgraph Retry2["π Retry Searches"]
Alt1["Galaxy S25 MSRP retail"]:::search
Alt2["Samsung S25 pricing leak"]:::search
Alt3["S25 vs S24 price comparison"]:::search
end
Strat --> Retry2
subgraph FC2G["π₯ Retry API Calls"]
FC4["Firecrawl /search API<br/>Alt Query 1"]:::firecrawl
FC5["Firecrawl /search API<br/>Alt Query 2"]:::firecrawl
FC6["Firecrawl /search API<br/>Alt Query 3"]:::firecrawl
end
Alt1 --> FC4
Alt2 --> FC5
Alt3 --> FC6
Results2["SamMobile β ($899 leak)<br/>9to5Google β ($100 more)<br/>PhoneArena β ($899)"]:::source
FC4 --> Results2
FC5 --> Results2
FC6 --> Results2
Final["All answers found β<br/>S25 price: $899"]:::good
Results2 --> Final
Synthesis["LLM synthesizes response"]:::synthesis
Final --> Synthesis
FollowUp["Generate follow-up questions"]:::primary
Synthesis --> FollowUp
Citations["List citations [1-10]"]:::primary
FollowUp --> Citations
Answer["Complete response delivered"]:::answer
Citations --> Answer
%% No path - skip retry and go straight to synthesis
Retry -->|No| Synthesis
classDef query fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
classDef subq fill:#ffd4b3,stroke:#ff6b1a,stroke-width:1px,color:#333
classDef search fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff
classDef source fill:#3a4a5c,stroke:#2c3a47,stroke-width:2px,color:#fff
classDef check fill:#ffeb3b,stroke:#fbc02d,stroke-width:2px,color:#333
classDef good fill:#4caf50,stroke:#388e3c,stroke-width:2px,color:#fff
classDef bad fill:#f44336,stroke:#d32f2f,stroke-width:2px,color:#fff
classDef strategy fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff
classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
classDef answer fill:#3a4a5c,stroke:#2c3a47,stroke-width:3px,color:#fff
classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff
classDef label fill:none,stroke:none,color:#666,font-weight:bold
- Break Down - Complex queries split into focused sub-questions
- Search - Multiple searches via Firecrawl API for comprehensive coverage
- Extract - Markdown content extracted from web sources
- Validate - Check if sources actually answer the questions (0.7+ confidence)
- Retry - Alternative search terms for unanswered questions (max 2 attempts)
- Synthesize - GPT-4o combines findings into cited answer
- Smart Search - Breaks complex queries into multiple focused searches
- Answer Validation - Verifies sources contain actual answers (0.7+ confidence)
- Auto-Retry - Alternative search terms for unanswered questions
- Real-time Progress - Live updates as searches complete
- Full Citations - Every fact linked to its source
- Context Memory - Follow-up questions maintain conversation context
Customize search behavior by modifying lib/config.ts
:
export const SEARCH_CONFIG = {
// Search Settings
MAX_SEARCH_QUERIES: 12, // Maximum number of search queries to generate
MAX_SOURCES_PER_SEARCH: 4, // Maximum sources to return per search query
MAX_SOURCES_TO_SCRAPE: 3, // Maximum sources to scrape for additional content
// Content Processing
MIN_CONTENT_LENGTH: 100, // Minimum content length to consider valid
SUMMARY_CHAR_LIMIT: 100, // Character limit for source summaries
// Retry Logic
MAX_RETRIES: 2, // Maximum retry attempts for failed operations
MAX_SEARCH_ATTEMPTS: 2, // Maximum attempts to find answers via search
MIN_ANSWER_CONFIDENCE: 0.7, // Minimum confidence (0-1) that a question was answered
// Timeouts
SCRAPE_TIMEOUT: 15000, // Timeout for scraping operations (ms)
} as const;
Firesearch leverages Firecrawl's powerful /search
endpoint:
- Purpose: Finds relevant URLs AND extracts markdown content in one call
- Usage: Each decomposed query is sent to find 6-8 relevant sources with content
- Response: Returns URLs with titles, snippets, AND full markdown content
- Key Feature: The
scrapeOptions
parameter enables content extraction during search - Example:
POST /search { "query": "iPhone 16 specs pricing", "limit": 8, "scrapeOptions": { "formats": ["markdown"] } }
When initial results are insufficient, the system automatically tries:
- Broaden Keywords: Removes specific terms for wider results
- Narrow Focus: Adds specific terms to target missing aspects
- Synonyms: Uses alternative terms and phrases
- Rephrase: Completely reformulates the query
- Decompose: Breaks complex queries into sub-questions
- Academic: Adds scholarly terms for research-oriented results
- Practical: Focuses on tutorials and how-to guides
- "Who are the founders of Firecrawl?"
- "When did NVIDIA release the RTX 4080 Super?"
- "Compare the latest iPhone, Samsung Galaxy, and Google Pixel flagship features"
MIT License