feat: initialize HTML scraper API with Flask and SeleniumBase
This commit is contained in:
68
README.md
Normal file
68
README.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# HTML Scraper
|
||||
|
||||
A simple Python API that exposes a single route to return the HTML content of any page, using Flask and SeleniumBase.
|
||||
|
||||
## Stack
|
||||
|
||||
- **Python 3.12** with **uv** for dependency management
|
||||
- **Flask** as web framework
|
||||
- **SeleniumBase** (undetected Chrome) for page rendering
|
||||
- **Gunicorn** as production WSGI server
|
||||
- **Docker** for containerization
|
||||
|
||||
## Setup
|
||||
|
||||
### Local development
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Copy and edit environment variables
|
||||
cp .env.example .env
|
||||
|
||||
# Run the server
|
||||
uv run python run.py
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
# Build
|
||||
docker build -t html-scraper .
|
||||
|
||||
# Run
|
||||
docker run -p 4001:4001 --env-file .env html-scraper
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
### Health check
|
||||
|
||||
```
|
||||
GET /api/health
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{"status": "ok"}
|
||||
```
|
||||
|
||||
### Scrape HTML
|
||||
|
||||
```
|
||||
POST /api/scrape
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"url": "https://example.com"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"html": "<!DOCTYPE html>..."
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user