Go to file

HTML Scraper

A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.

Stack

  • Python 3.12 with uv for dependency management
  • Flask as web framework
  • SeleniumBase (undetected Chrome) for page rendering
  • Gunicorn as production WSGI server
  • Docker for containerization

Setup

Local development

# Install dependencies
uv sync

# Copy and edit environment variables
cp .env.example .env

# Run the server
uv run python run.py

Docker

# Build
docker build -t html-scraper .

# Run
docker run -p 4001:4001 --env-file .env html-scraper

API

Health check

GET /api/health

Response:

{"status": "ok"}

Scrape HTML

POST /api/scrape
Content-Type: application/json

{
  "url": "https://example.com"
}

Response:

{
  "success": true,
  "html": "<!DOCTYPE html>..."
}
Description
No description provided
Readme 83 KiB
Languages
Python 79.2%
Dockerfile 20.8%