🎙️ TranscriptAI Documentation

Complete guide for AI-powered audio transcription

Free & Open Source CLI & Web Interface OpenAI Whisper

🌟 Overview

TranscriptAI is a powerful, completely free audio transcription tool that leverages OpenAI's state-of-the-art Whisper AI model for highly accurate speech-to-text conversion. Whether you need to transcribe meetings, interviews, podcasts, or any audio content, TranscriptAI provides two convenient interfaces:

  • Web Interface: Browser-based tool for quick transcriptions with drag-and-drop functionality
  • CLI Tool: Command-line interface for developers and power users requiring batch processing and automation

Supporting 99+ languages with automatic detection and translation capabilities, TranscriptAI processes multiple audio formats including MP3, WAV, M4A, FLAC, OGG, WebM, and AAC (CLI only).

Web Interface

Browser-based transcription with drag & drop

CLI Tool

Command-line interface for batch processing

AI-Powered

OpenAI Whisper for highest accuracy

Privacy-First

Your API keys never leave your browser

🚀 Quick Start

🌐 Web Interface (Easiest)

  1. Visit the web app
  2. Upload your audio file
  3. Choose demo mode or add your OpenAI API key
  4. Get instant transcription results
Try Web App Now

💻 CLI Installation

# Clone the repository
git clone https://github.com/ombharatiya/transcript-ai.git
cd transcript-ai/cli

# Run setup script
python setup.py

# Start transcribing
source venv/bin/activate
python src/audio_transcriber.py input/audio.mp3

🌐 Web Interface Guide

Two Modes Available

Demo Mode

Perfect for testing

  • No API key required
  • Uses sample text responses
  • Test all UI features
  • Validate audio file formats

Real AI Mode

Actual transcription

  • Requires OpenAI API key
  • Real Whisper AI transcription
  • Support for all languages
  • Translation capabilities

Using Your OpenAI API Key

Security Guarantee

  • ✅ Stored only in browser memory
  • ✅ Never sent to our servers
  • ✅ Direct communication with OpenAI
  • ✅ Cleared when you close the tab

Get your API key: OpenAI Platform

💻 CLI Tool Guide

Installation

# Automatic setup (recommended)
cd transcript-ai/cli
python setup.py

# Manual setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Basic Usage

# Single file
python src/audio_transcriber.py input/audio.mp3

# Batch processing
python src/audio_transcriber.py input/*.wav --batch

# Different AI model
python src/audio_transcriber.py audio.mp3 --model large

# Specify language
python src/audio_transcriber.py audio.mp3 --language en

# Translate to English
python src/audio_transcriber.py foreign.mp3 --task translate

Advanced Options

# Custom output directory
python src/audio_transcriber.py audio.mp3 --output-dir results/

# Skip JSON output
python src/audio_transcriber.py audio.mp3 --no-json

# File information only
python src/audio_transcriber.py audio.mp3 --info

# Adjust temperature
python src/audio_transcriber.py audio.mp3 --temperature 0.2

✨ Features

Audio Format Support

Web Interface: MP3, WAV, M4A, FLAC, OGG, WebM

CLI Tool: All above + AAC (converted via FFmpeg)

AI Models

  • Whisper-1: Web interface (OpenAI API)
  • Tiny/Base/Small/Medium/Large: CLI tool options

Languages

99+ languages supported with automatic detection

Built-in translation to English

🎵 Supported Audio Formats

Format Extension Web Interface CLI Tool Notes
MP3 .mp3 Most common format
WAV .wav High quality, large files
M4A .m4a Apple's format
FLAC .flac Lossless compression
OGG .ogg Open source format
WebM .webm Web-optimized
AAC .aac CLI only (FFmpeg converts)

🔧 Troubleshooting

Common Issues

Web Interface

  • Invalid API key: Check your OpenAI key starts with 'sk-'
  • File format error: Use supported formats (no AAC)
  • Rate limit: Wait a few minutes between requests

CLI Tool

  • FFmpeg not found: Run the setup script or install manually
  • Out of memory: Use smaller model (tiny/base)
  • Permission denied: Check file permissions

🎯 Use Cases

Business

  • Meeting transcriptions
  • Interview documentation
  • Customer support analysis

Education

  • Lecture notes
  • Research interviews
  • Language learning

Content Creation

  • Podcast show notes
  • Video subtitles
  • Social media captions

Accessibility

  • Hearing accessibility
  • Voice disabilities
  • Multi-language support

🤝 Contributing

TranscriptAI is open source and welcomes contributions!

Ways to Contribute

  • Report bugs: GitHub Issues
  • Suggest features: Open a feature request
  • Improve documentation: Submit PRs for docs
  • Add translations: Help with internationalization

Development Setup

# Fork the repository
git clone https://github.com/yourusername/transcript-ai.git

# Set up CLI development
cd transcript-ai/cli
python setup.py

# Set up web development
cd transcript-ai/web
npm install  # if using build tools
python -m http.server 8000  # serve locally

❓ Frequently Asked Questions

What is TranscriptAI and how does it work?

TranscriptAI is a free, open-source audio transcription tool that uses OpenAI's Whisper AI model to convert speech to text. It works by processing audio files through advanced machine learning algorithms that can recognize speech patterns in 99+ languages with high accuracy.

How accurate is TranscriptAI's transcription?

TranscriptAI uses OpenAI's Whisper model, which achieves industry-leading accuracy rates of 95%+ for clear audio in supported languages. Accuracy depends on audio quality, speaker clarity, background noise, and language/accent.

What audio formats does TranscriptAI support?

The web interface supports MP3, WAV, M4A, FLAC, OGG, and WebM formats. The CLI tool supports all these formats plus AAC (converted via FFmpeg). Maximum file size is 25MB for the web interface.

Is TranscriptAI really free? Are there any hidden costs?

TranscriptAI is completely free and open source. For real AI transcription, you need your own OpenAI API key (which has usage-based pricing from OpenAI). The demo mode and CLI tool (with local models) are entirely free.

How do I install and use the CLI tool?

Clone the repository from GitHub, navigate to the CLI directory, run 'python setup.py' for automatic setup, then use 'python src/audio_transcriber.py [audio_file]' to transcribe. The setup script installs all dependencies including FFmpeg.

What languages are supported for transcription?

TranscriptAI supports 99+ languages including English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Korean, Chinese, Arabic, Hindi, and many more. It features automatic language detection and can translate any language to English.

Is my data secure? Do you store my audio files?

Your data is completely secure. We never store your audio files or API keys. The web interface processes files client-side and sends them directly to OpenAI (when using real mode). The CLI tool processes everything locally on your machine.

Can I use TranscriptAI for commercial purposes?

Yes! TranscriptAI is released under the MIT License, allowing commercial use. However, when using OpenAI's API, you must comply with OpenAI's terms of service for commercial usage.

How do I get an OpenAI API key?

Visit platform.openai.com/api-keys, create an account, and generate a new API key. You'll need to add billing information to your OpenAI account to use the API, but you only pay for actual usage.

What's the difference between demo mode and real AI mode?

Demo mode shows sample transcription text to test the interface without requiring an API key. Real AI mode uses your OpenAI API key to perform actual transcription using the Whisper model, providing accurate results for your audio files.