Holo795/html-scraper

Go to file

ɧσℓσ 448b486672 feat: add explicit Chrome installation and pre-download chromedriver in Docker build

2026-02-13 16:03:49 +01:00

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

.env.example

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

.gitignore

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

docker-compose.yml

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

Dockerfile

feat: add explicit Chrome installation and pre-download chromedriver in Docker build

2026-02-13 16:03:49 +01:00

pyproject.toml

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

README.md

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

run.py

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

uv.lock

feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00

README.md

HTML Scraper

A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.

Stack

Python 3.12 with uv for dependency management
Flask as web framework
SeleniumBase (undetected Chrome) for page rendering
Gunicorn as production WSGI server
Docker for containerization

Setup

Local development

# Install dependencies
uv sync

# Copy and edit environment variables
cp .env.example .env

# Run the server
uv run python run.py

Docker

# Build
docker build -t html-scraper .

# Run
docker run -p 4001:4001 --env-file .env html-scraper

API

Health check

GET /api/health

Response:

{"status": "ok"}

Scrape HTML

POST /api/scrape
Content-Type: application/json

{
  "url": "https://example.com"
}

Response:

{
  "success": true,
  "html": "<!DOCTYPE html>..."
}