Files
edge-tts/web/README.md
2025-12-02 12:22:06 +08:00

298 lines
6.0 KiB
Markdown

# Edge TTS Web UI
A Progressive Web App (PWA) for converting text to speech using Microsoft Edge's online TTS service.
## Features
- 🎙️ **Text to Speech**: Convert any text to natural-sounding speech
- 🌍 **Multiple Languages**: Support for 100+ voices in various languages
- 🎛️ **Voice Customization**: Adjust speed, volume, and pitch
- 📱 **PWA Support**: Install as an app on any device
- 💾 **Offline Support**: Service worker caching for offline usage
- 📝 **History**: Keep track of recent generations
- ⬇️ **Download**: Save generated audio as MP3 files
## Installation
### Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
### Setup
1. Navigate to the web directory:
```bash
cd web
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
## Usage
### Start the Server
```bash
python server.py
```
Or with custom options:
```bash
python server.py --host 0.0.0.0 --port 8000
```
Options:
- `--host`: Host to bind to (default: 0.0.0.0)
- `--port`: Port to bind to (default: 8000)
- `--reload`: Enable auto-reload for development
### Access the Web UI
Open your browser and navigate to:
```
http://localhost:8000
```
### Install as PWA
1. Open the web UI in a modern browser (Chrome, Edge, Safari, Firefox)
2. Look for the install prompt or click "Install App" button
3. The app will be added to your home screen/app drawer
## API Endpoints
The server provides the following REST API endpoints:
### GET /api/health
Health check endpoint
**Response:**
```json
{
"status": "healthy",
"service": "edge-tts-api"
}
```
### GET /api/voices
Get list of all available voices
**Response:**
```json
[
{
"Name": "en-US-EmmaMultilingualNeural",
"ShortName": "en-US-EmmaMultilingualNeural",
"Gender": "Female",
"Locale": "en-US",
"LocaleName": "English (United States)",
...
}
]
```
### POST /api/synthesize
Synthesize speech from text
**Request Body:**
```json
{
"text": "Hello, world!",
"voice": "en-US-EmmaMultilingualNeural",
"rate": "+0%",
"volume": "+0%",
"pitch": "+0Hz"
}
```
**Response:**
Returns MP3 audio file
**Parameters:**
- `text` (required): Text to convert (max 5000 characters)
- `voice` (optional): Voice name (default: "en-US-EmmaMultilingualNeural")
- `rate` (optional): Speech rate from -100% to +100% (default: "+0%")
- `volume` (optional): Volume from -100% to +100% (default: "+0%")
- `pitch` (optional): Pitch from -500Hz to +500Hz (default: "+0Hz")
### POST /api/synthesize-with-subtitles
Synthesize speech with subtitle generation
**Request Body:**
Same as /api/synthesize
**Response:**
```json
{
"audio": "base64_encoded_audio_data",
"subtitles": "SRT formatted subtitles",
"format": "mp3"
}
```
## File Structure
```
web/
├── index.html # Main HTML page
├── styles.css # Styles and theme
├── app.js # Client-side JavaScript
├── manifest.json # PWA manifest
├── sw.js # Service worker
├── server.py # FastAPI backend server
├── requirements.txt # Python dependencies
├── icon-192.png # App icon (192x192)
├── icon-512.png # App icon (512x512)
└── README.md # This file
```
## Development
### Running in Development Mode
```bash
python server.py --reload
```
This enables auto-reload when you modify the code.
### Testing
Test the API endpoints using curl:
```bash
# Get voices
curl http://localhost:8000/api/voices
# Synthesize speech
curl -X POST http://localhost:8000/api/synthesize \
-H "Content-Type: application/json" \
-d '{"text":"Hello world","voice":"en-US-EmmaMultilingualNeural"}' \
--output speech.mp3
```
### Customization
#### Update Icons
Replace `icon-192.png` and `icon-512.png` with your own icons.
For best results, create:
- 192x192 PNG for mobile devices
- 512x512 PNG for high-resolution displays
#### Update Theme Color
Edit the `--primary-color` variable in [styles.css](styles.css):
```css
:root {
--primary-color: #2563eb; /* Change this color */
}
```
Also update `theme_color` in [manifest.json](manifest.json).
## Browser Support
### PWA Features
- ✅ Chrome/Edge (Desktop & Mobile)
- ✅ Safari (iOS 11.3+)
- ✅ Firefox (Desktop & Android)
- ✅ Samsung Internet
### Service Worker
- ✅ All modern browsers
- ❌ IE11 (not supported)
## Troubleshooting
### Port Already in Use
If port 8000 is already in use:
```bash
python server.py --port 8080
```
### Icons Not Showing
Make sure `icon-192.png` and `icon-512.png` exist in the web directory.
### Voices Not Loading
Check the server logs for errors. The server needs internet connection to fetch voices from Microsoft's API.
### CORS Issues
The server is configured to allow all origins for development. For production, update the CORS settings in [server.py](server.py):
```python
app.add_middleware(
CORSMiddleware,
allow_origins=["https://yourdomain.com"], # Update this
...
)
```
## Deployment
### Production Considerations
1. **Use a production ASGI server**: Uvicorn with multiple workers
```bash
uvicorn server:app --host 0.0.0.0 --port 8000 --workers 4
```
2. **Use a reverse proxy**: nginx or Apache for SSL/TLS
3. **Set environment variables**:
```bash
export EDGE_TTS_HOST=0.0.0.0
export EDGE_TTS_PORT=8000
```
4. **Update CORS settings**: Restrict to your domain
5. **Enable HTTPS**: Required for PWA installation
### Docker Deployment
Create a `Dockerfile`:
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "server.py", "--host", "0.0.0.0", "--port", "8000"]
```
Build and run:
```bash
docker build -t edge-tts-web .
docker run -p 8000:8000 edge-tts-web
```
## License
This web UI is built on top of [edge-tts](https://github.com/rany2/edge-tts).
## Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
## Credits
- **edge-tts**: The underlying TTS library by [@rany2](https://github.com/rany2)
- **Microsoft Edge TTS**: The text-to-speech service