feat: initialize HTML scraper API with Flask and SeleniumBase

This commit is contained in:
2026-02-13 16:03:35 +01:00
commit 9659382d62
16 changed files with 1707 additions and 0 deletions

68
README.md Normal file
View File

@@ -0,0 +1,68 @@
# HTML Scraper
A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.
## Stack
- **Python 3.12** with **uv** for dependency management
- **Flask** as web framework
- **SeleniumBase** (undetected Chrome) for page rendering
- **Gunicorn** as production WSGI server
- **Docker** for containerization
## Setup
### Local development
```bash
# Install dependencies
uv sync
# Copy and edit environment variables
cp .env.example .env
# Run the server
uv run python run.py
```
### Docker
```bash
# Build
docker build -t html-scraper .
# Run
docker run -p 4001:4001 --env-file .env html-scraper
```
## API
### Health check
```
GET /api/health
```
Response:
```json
{"status": "ok"}
```
### Scrape HTML
```
POST /api/scrape
Content-Type: application/json
{
"url": "https://example.com"
}
```
Response:
```json
{
"success": true,
"html": "<!DOCTYPE html>..."
}
```