Documentation - DDG Web Search

Getting Started

The DDG Web Search package provides a comprehensive solution for searching the web and fetching web content with built-in rate limiting, error handling, and multiple interfaces.

Installation

npm install @lucid-spark/ddg-web-search

Quick Start

import { WebSearcher, WebContentFetcher } from "@lucid-spark/ddg-web-search";

const searcher = new WebSearcher();
const fetcher = new WebContentFetcher();

// Search the web
const results = await searcher.search("TypeScript");

// Fetch web content
const content = await fetcher.fetch("https://example.com");

API Reference

Complete reference for all classes, methods, and interfaces provided by the package.

WebSearcher

The main class for searching the web using DuckDuckGo with browser automation. Features built-in rate limiting, captcha handling, and error handling.

Constructor

new WebSearcher(headless?: boolean)

Creates a new WebSearcher instance with browser automation capabilities.

headless (optional): Whether to run browser in headless mode (default: true)

Methods

search(query: string): Promise<SearchResult[]>

Searches the web using DuckDuckGo with browser automation and returns formatted results.

Parameters

query (string): The search query string

Returns

Promise resolving to an array of SearchResult objects.

Features

JavaScript rendering support for dynamic content
Captcha detection and handling
Anti-bot detection countermeasures
Conservative rate limiting (1 request per 2 seconds)

Example

const searcher = new WebSearcher();
const results = await searcher.search("Node.js tutorials");

results.forEach(result => {
  console.log(result.title);
  console.log(result.url);
  console.log(result.snippet);
});

// Important: cleanup browser resources
await searcher.close();

close(): Promise<void>

Closes the browser instance and frees up system resources. Important for preventing memory leaks.

Example

try {
  const results = await searcher.search("query");
  // process results
} finally {
  await searcher.close(); // Always cleanup
}

WebContentFetcher

Class for fetching and parsing web content with configurable rate limiting.

Constructor

new WebContentFetcher(rateLimit?: number, rateLimitInterval?: number)

Parameters

rateLimit (optional): Number of requests allowed per interval (default: 1)
rateLimitInterval (optional): Time interval in milliseconds (default: 1000)

Methods

fetch(url: string): Promise<FetchResult>

Fetches content from the specified URL with rate limiting and error handling.

Parameters

url (string): The URL to fetch content from

Returns

Promise resolving to a FetchResult object.

Example

// Default rate limiting (1 request per second)
const fetcher = new WebContentFetcher();

// Custom rate limiting (2 requests per 3 seconds)
const customFetcher = new WebContentFetcher(2, 3000);

const result = await fetcher.fetch("https://example.com");

if (result.success) {
  console.log("Content:", result.data?.content);
  console.log("Title:", result.data?.metadata?.title);
  console.log("Description:", result.data?.metadata?.description);
} else {
  console.error("Error:", result.error);
}

CLI Interface

Command-line interface for easy access to search and fetch functionality.

Installation

npm install -g @lucid-spark/ddg-web-search

Commands

ddg-web-search search <query>

Search the web for the given query using browser automation.

ddg-web-search search "TypeScript tutorials"

ddg-web-search fetch <url>

Fetch content from the specified URL with intelligent parsing.

ddg-web-search fetch https://example.com

ddg-web-search interactive

Start interactive mode for multiple commands with enhanced user experience.

ddg-web-search interactive

ddg-web-search mcp

Start MCP server with stdio transport for AI assistant integration.

ddg-web-search mcp

ddg-web-search mcp-http

Start MCP server with HTTP transport for web-based AI assistant integration.

ddg-web-search mcp-http

Interactive Mode Commands

search <query> or s <query> - Search the web
fetch <url> or f <url> - Fetch web content
mcp - Start MCP server (stdio transport)
mcp-http - Start MCP server (HTTP transport)
help or h - Show help
version or v - Show version
clear or cls - Clear screen
exit or quit or q - Exit

MCP Server

Model Context Protocol server for AI assistant integration with support for both stdio and HTTP transports.

Starting the Server

Stdio Transport (Default)

# Using global binary
ddg-web-search-mcp

# Using npx
npx @lucid-spark/ddg-web-search mcp

# Using built files
node dist/mcp.js

HTTP Transport

# Default host and port (localhost:3001)
node dist/mcp.js --transport http

# Custom host and port
node dist/mcp.js --transport http --host 0.0.0.0 --port 8080

Configuration

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "ddg-web-search": {
      "command": "@lucid-spark/ddg-web-search-mcp",
      "args": [],
      "env": {}
    }
  }
}

Available Tools

search

Search the web using DuckDuckGo with browser automation.

Input: { "query": "search terms" }

Output: Formatted list of search results with titles, URLs, and snippets

Features: JavaScript rendering, captcha handling, anti-detection measures

fetch_web_content

Fetch and parse content from a web URL with intelligent scraping.

Input: { "url": "https://example.com" }

Output: Parsed content with metadata (truncated to 10,000 characters if needed)

Features: HTML-to-Markdown conversion, metadata extraction, content cleaning

HTTP Endpoints

When using HTTP transport, the server provides:

GET / - Server information and available endpoints
GET /sse - Server-Sent Events connection for receiving responses
POST /message/{sessionId} - Send MCP requests to the server

Browser Automation

The WebSearcher uses Puppeteer for reliable browser automation, providing better captcha handling and JavaScript rendering support.

Key Features

JavaScript Rendering: Full support for dynamically loaded content
Captcha Handling: Intelligent detection and handling of captcha challenges
Anti-Detection: Uses real browser behavior to avoid bot detection
Resource Management: Proper cleanup of browser instances to prevent memory leaks

Browser Configuration

// Headless mode (default) - no browser window
const searcher = new WebSearcher();

// Non-headless mode for debugging or manual captcha solving
const debugSearcher = new WebSearcher(false);

Resource Management

Always call close() to prevent memory leaks:

const searcher = new WebSearcher();
try {
  const results = await searcher.search("query");
  // Process results
} finally {
  await searcher.close(); // Essential for cleanup
}

Captcha Handling

Headless Mode: Returns empty results if captcha is detected
Non-Headless Mode: Allows time for manual captcha solving
Detection: Automatically detects captcha challenges on the page

Performance Considerations

Browser Overhead: Higher resource usage than HTTP requests
Initialization: Browser launch takes 1-2 seconds
Memory Usage: Monitor for memory leaks in long-running applications
Browser Reuse: Browser instance is reused across searches for efficiency

Utilities

Utility classes used internally and available for advanced usage.

RateLimiter

Token bucket rate limiter for controlling request frequency.

import { RateLimiter } from "@lucid-spark/ddg-web-search";

// 5 requests per 10 seconds
const rateLimiter = new RateLimiter(5, 10000);

// Wait for permission to make a request
await rateLimiter.acquire();

// Get current status
const status = rateLimiter.getStatus();
console.log(status.requests, status.limit, status.interval);

HttpClient

Singleton HTTP client with centralized error handling and logging.

import { HttpClient } from "@lucid-spark/ddg-web-search";

const client = HttpClient.getInstance();

// GET request
const data = await client.get("https://api.example.com");

// POST request
const result = await client.post("https://api.example.com", { key: "value" });

TypeScript Types

Complete type definitions for all interfaces and types used in the package.

SearchResult

interface SearchResult {
  title: string;    // The title of the search result
  url: string;      // The URL of the search result
  snippet: string;  // A brief description or snippet
  icon?: string;    // Optional icon URL for the result
}

WebContent

interface WebContent {
  content: string;      // Main content of the page (HTML converted to Markdown)
  metadata?: {          // Optional metadata extracted from the page
    title?: string;     // Page title
    description?: string;  // Page description
    url?: string;       // Page URL
    author?: string;    // Page author
    publishDate?: string;  // Publication date
  };
}

FetchResult

interface FetchResult {
  success: boolean;     // Whether the fetch was successful
  data?: WebContent;    // Parsed web content (if available)
  error?: string;       // Error message (if failed)
}

FetchOptions

interface FetchOptions {
  timeout?: number;                    // Request timeout in milliseconds
  headers?: Record;    // Custom headers
}

Error Handling

The package provides comprehensive error handling for various failure scenarios.

Search Errors

Empty or invalid queries return empty results arrays
Network errors are caught and logged, returning empty arrays
Invalid API responses are handled gracefully

Fetch Errors

Invalid URLs return error results with descriptive messages
Network timeouts and connection errors are caught
HTTP error status codes are handled appropriately

Example Error Handling

try {
  const results = await searcher.search("query");
  
  if (results.length === 0) {
    console.log("No results found");
  }
  
  const fetchResult = await fetcher.fetch("https://example.com");
  
  if (!fetchResult.success) {
    console.error("Fetch failed:", fetchResult.error);
  }
} catch (error) {
  console.error("Unexpected error:", error);
}

Rate Limiting

Built-in rate limiting prevents overwhelming external services and ensures respectful API usage.

Default Rate Limits

WebSearcher: 1 request per 2 seconds (conservative for browser automation)
WebContentFetcher: 1 request per second (configurable)

Custom Rate Limiting

// 3 requests per 5 seconds
const fetcher = new WebContentFetcher(3, 5000);

// Rate limiting is enforced automatically
for (let i = 0; i < 5; i++) {
  const result = await fetcher.fetch(`https://httpbin.org/get?id=${i}`);
  console.log(`Request ${i + 1} completed`);
}

Rate Limiter API

const rateLimiter = new RateLimiter(2, 1000); // 2 per second

// Check status
const status = rateLimiter.getStatus();
console.log(`${status.requests}/${status.limit} requests used`);

// Acquire permission (waits if necessary)
await rateLimiter.acquire();
console.log("Permission granted for request");

Testing

The package includes comprehensive test suites and supports easy mocking for unit tests.

Running Tests

# Run all tests
npm test

# Run tests with coverage
npm run test -- --coverage

# Run specific test file
npm test -- WebSearcher.test.ts

Mocking for Tests

import { WebSearcher } from '@lucid-spark/ddg-web-search';
import axios from 'axios';

jest.mock('axios');
const mockedAxios = axios as jest.Mocked;

describe('WebSearcher', () => {
  it('should return search results', async () => {
    mockedAxios.get.mockResolvedValue({
      data: { RelatedTopics: [/* mock data */] }
    });
    
    const searcher = new WebSearcher();
    const results = await searcher.search('test');
    
    expect(results).toHaveLength(1);
  });
});

Migration Guide

Guide for migrating from older versions or similar packages.

Breaking Changes

This section will be updated as new versions are released with any breaking changes and migration instructions.

Upgrading

# Update to latest version
npm update @lucid-spark/ddg-web-search

# Check for security updates
npm audit

# Update global installation
npm install -g @lucid-spark/ddg-web-search@latest