edge-tts/web/README.md

# Edge TTS Web UI

A Progressive Web App (PWA) for converting text to speech using Microsoft Edge's online TTS service.

## Features

- 🎙️ **Text to Speech**: Convert any text to natural-sounding speech
- 🌍 **Multiple Languages**: Support for 100+ voices in various languages
- 🎛️ **Voice Customization**: Adjust speed, volume, and pitch
- 📱 **PWA Support**: Install as an app on any device
- 💾 **Offline Support**: Service worker caching for offline usage
- 📝 **History**: Keep track of recent generations
- ⬇️ **Download**: Save generated audio as MP3 files

## Installation

### Prerequisites

- Python 3.8 or higher
- pip (Python package manager)

### Setup

1. Navigate to the web directory:
```bash
cd web
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

### Start the Server

```bash
python server.py
```

Or with custom options:
```bash
python server.py --host 0.0.0.0 --port 8000
```

Options:
- `--host`: Host to bind to (default: 0.0.0.0)
- `--port`: Port to bind to (default: 8000)
- `--reload`: Enable auto-reload for development

### Access the Web UI

Open your browser and navigate to:
```
http://localhost:8000
```

### Install as PWA

1. Open the web UI in a modern browser (Chrome, Edge, Safari, Firefox)
2. Look for the install prompt or click "Install App" button
3. The app will be added to your home screen/app drawer

## API Endpoints

The server provides the following REST API endpoints:

### GET /api/health
Health check endpoint

**Response:**
```json
{
  "status": "healthy",
  "service": "edge-tts-api"
}
```

### GET /api/voices
Get list of all available voices

**Response:**
```json
[
  {
    "Name": "en-US-EmmaMultilingualNeural",
    "ShortName": "en-US-EmmaMultilingualNeural",
    "Gender": "Female",
    "Locale": "en-US",
    "LocaleName": "English (United States)",
    ...
  }
]
```

### POST /api/synthesize
Synthesize speech from text

**Request Body:**
```json
{
  "text": "Hello, world!",
  "voice": "en-US-EmmaMultilingualNeural",
  "rate": "+0%",
  "volume": "+0%",
  "pitch": "+0Hz"
}
```

**Response:**
Returns MP3 audio file

**Parameters:**
- `text` (required): Text to convert (max 5000 characters)
- `voice` (optional): Voice name (default: "en-US-EmmaMultilingualNeural")
- `rate` (optional): Speech rate from -100% to +100% (default: "+0%")
- `volume` (optional): Volume from -100% to +100% (default: "+0%")
- `pitch` (optional): Pitch from -500Hz to +500Hz (default: "+0Hz")

### POST /api/synthesize-with-subtitles
Synthesize speech with subtitle generation

**Request Body:**
Same as /api/synthesize

**Response:**
```json
{
  "audio": "base64_encoded_audio_data",
  "subtitles": "SRT formatted subtitles",
  "format": "mp3"
}
```

## File Structure

```
web/
├── index.html          # Main HTML page
├── styles.css          # Styles and theme
├── app.js             # Client-side JavaScript
├── manifest.json      # PWA manifest
├── sw.js              # Service worker
├── server.py          # FastAPI backend server
├── requirements.txt   # Python dependencies
├── icon-192.png       # App icon (192x192)
├── icon-512.png       # App icon (512x512)
└── README.md          # This file
```

## Development

### Running in Development Mode

```bash
python server.py --reload
```

This enables auto-reload when you modify the code.

### Testing

Test the API endpoints using curl:

```bash
# Get voices
curl http://localhost:8000/api/voices

# Synthesize speech
curl -X POST http://localhost:8000/api/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world","voice":"en-US-EmmaMultilingualNeural"}' \
  --output speech.mp3
```

### Customization

#### Update Icons

Replace `icon-192.png` and `icon-512.png` with your own icons.

For best results, create:
- 192x192 PNG for mobile devices
- 512x512 PNG for high-resolution displays

#### Update Theme Color

Edit the `--primary-color` variable in [styles.css](styles.css):

```css
:root {
    --primary-color: #2563eb; /* Change this color */
}
```

Also update `theme_color` in [manifest.json](manifest.json).

## Browser Support

### PWA Features
- ✅ Chrome/Edge (Desktop & Mobile)
- ✅ Safari (iOS 11.3+)
- ✅ Firefox (Desktop & Android)
- ✅ Samsung Internet

### Service Worker
- ✅ All modern browsers
- ❌ IE11 (not supported)

## Troubleshooting

### Port Already in Use

If port 8000 is already in use:
```bash
python server.py --port 8080
```

### Icons Not Showing

Make sure `icon-192.png` and `icon-512.png` exist in the web directory.

### Voices Not Loading

Check the server logs for errors. The server needs internet connection to fetch voices from Microsoft's API.

### CORS Issues

The server is configured to allow all origins for development. For production, update the CORS settings in [server.py](server.py):

```python
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],  # Update this
    ...
)
```

## Deployment

### Production Considerations

1. **Use a production ASGI server**: Uvicorn with multiple workers
   ```bash
   uvicorn server:app --host 0.0.0.0 --port 8000 --workers 4
   ```

2. **Use a reverse proxy**: nginx or Apache for SSL/TLS

3. **Set environment variables**:
   ```bash
   export EDGE_TTS_HOST=0.0.0.0
   export EDGE_TTS_PORT=8000
   ```

4. **Update CORS settings**: Restrict to your domain

5. **Enable HTTPS**: Required for PWA installation

### Docker Deployment

Create a `Dockerfile`:

```dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8000"]
```

Build and run:
```bash
docker build -t edge-tts-web .
docker run -p 8000:8000 edge-tts-web
```

## License

This web UI is built on top of [edge-tts](https://github.com/rany2/edge-tts).

## Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

## Credits

- **edge-tts**: The underlying TTS library by [@rany2](https://github.com/rany2)
- **Microsoft Edge TTS**: The text-to-speech service