941 B
941 B
HTML Scraper
A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.
Stack
- Python 3.12 with uv for dependency management
- Flask as web framework
- SeleniumBase (undetected Chrome) for page rendering
- Gunicorn as production WSGI server
- Docker for containerization
Setup
Local development
# Install dependencies
uv sync
# Copy and edit environment variables
cp .env.example .env
# Run the server
uv run python run.py
Docker
# Build
docker build -t html-scraper .
# Run
docker run -p 4001:4001 --env-file .env html-scraper
API
Health check
GET /api/health
Response:
{"status": "ok"}
Scrape HTML
POST /api/scrape
Content-Type: application/json
{
"url": "https://example.com"
}
Response:
{
"success": true,
"html": "<!DOCTYPE html>..."
}