Files
html-scraper/README.md

69 lines
941 B
Markdown

# HTML Scraper
A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.
## Stack
- **Python 3.12** with **uv** for dependency management
- **Flask** as web framework
- **SeleniumBase** (undetected Chrome) for page rendering
- **Gunicorn** as production WSGI server
- **Docker** for containerization
## Setup
### Local development
```bash
# Install dependencies
uv sync
# Copy and edit environment variables
cp .env.example .env
# Run the server
uv run python run.py
```
### Docker
```bash
# Build
docker build -t html-scraper .
# Run
docker run -p 4001:4001 --env-file .env html-scraper
```
## API
### Health check
```
GET /api/health
```
Response:
```json
{"status": "ok"}
```
### Scrape HTML
```
POST /api/scrape
Content-Type: application/json
{
"url": "https://example.com"
}
```
Response:
```json
{
"success": true,
"html": "<!DOCTYPE html>..."
}
```