html-scraper/README.md

# HTML Scraper

A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.

## Stack

- **Python 3.12** with **uv** for dependency management
- **Flask** as web framework
- **SeleniumBase** (undetected Chrome) for page rendering
- **Gunicorn** as production WSGI server
- **Docker** for containerization

## Setup

### Local development

```bash
# Install dependencies
uv sync

# Copy and edit environment variables
cp .env.example .env

# Run the server
uv run python run.py
```

### Docker

```bash
# Build
docker build -t html-scraper .

# Run
docker run -p 4001:4001 --env-file .env html-scraper
```

## API

### Health check

```
GET /api/health
```

Response:
```json
{"status": "ok"}
```

### Scrape HTML

```
POST /api/scrape
Content-Type: application/json

{
  "url": "https://example.com"
}
```

Response:
```json
{
  "success": true,
  "html": "<!DOCTYPE html>..."
}
```