feat: initialize HTML scraper API with Flask and SeleniumBase

2026-02-13 16:03:35 +01:00
commit 9659382d62
16 changed files with 1707 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,68 @@
+# HTML Scraper
+
+A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.
+
+## Stack
+
+- **Python 3.12** with **uv** for dependency management
+- **Flask** as web framework
+- **SeleniumBase** (undetected Chrome) for page rendering
+- **Gunicorn** as production WSGI server
+- **Docker** for containerization
+
+## Setup
+
+### Local development
+
+```bash
+# Install dependencies
+uv sync
+
+# Copy and edit environment variables
+cp .env.example .env
+
+# Run the server
+uv run python run.py
+```
+
+### Docker
+
+```bash
+# Build
+docker build -t html-scraper .
+
+# Run
+docker run -p 4001:4001 --env-file .env html-scraper
+```
+
+## API
+
+### Health check
+
+```
+GET /api/health
+```
+
+Response:
+```json
+{"status": "ok"}
+```
+
+### Scrape HTML
+
+```
+POST /api/scrape
+Content-Type: application/json
+
+{
+  "url": "https://example.com"
+}
+```
+
+Response:
+```json
+{
+  "success": true,
+  "html": "<!DOCTYPE html>..."
+}
+```