In the legal industry, where precision and efficiency are crucial, transcription services are essential. Learn how our AI-powered solution enhanced the transcription process for a leading legal tech startup, providing improved accuracy and productivity.
HIGH ACCURACY: Our solution significantly improves transcription accuracy for diverse accents and has achieved a 33% error reduction in comparison to the biggest competitor in the market.
TECHNICAL JARGON RECOGNITION: The system accurately recognizes specialized legal vocabulary, ensuring high-quality, context-specific transcriptions.
ENHANCED FEATURES AND INCREASED PRODUCTIVITY: With voice-based formatting command recognition, legal professionals can create entire documents using dictation without extra steps. These features significantly boost productivity and cost-effectiveness by reducing manual corrections and allowing professionals to focus on critical tasks.
Our client is an innovative startup from the Rhineland, dedicated to developing digital software solutions for lawyers, by lawyers.
Handling Technical and Industry-Specific Vocabulary in German:
Legal conferences and documents often include technical terms and jargon specific to the legal industry. Our client was in need of a dependable and effective transcription service to create dictated documents faster and with more fine grained voice-based controls over formatting.
Real-Time Transcription Solution
The solution needed to be both fast and efficient, while ensuring the confidentiality and security of all recorded conferences and transcriptions.
Legal Text Specificity
GPT models are not specifically trained on legal text and often lack the necessary legal knowledge and expertise. After transcribing the delivered text contains written voice commands that might also hold meaning within a regular sentence structure (e.g. the word “Absatz” meaning new paragraph, can also mean sales figures of a company in a different context.).
Duplication and hallucinations caused by Whisper
At times, there were duplicated parts where every second sentence was an exact copy of the previous one.
Step 1: Deciding on the Technology OpenAI’s Whisper is currently the best speech-to-text model available to the public. However, it does not handle text formatting by itself, which is necessary for a full-fledged dictation solution. Therefore, the pretrained text-to-text transformer model T5 was chosen and iteratively fine-tuned to translate dictated formatting commands into actual text formatting.
Step 2: Transcription with Whisper The Whisper model is used to automatically transcribe the audio files. The resulting text contains voice commands in written words, interlaced with sentences. This transcribed text is then passed through the fine-tuned T5 language model.
Step 3: Development The process of converting transcribed text into formatted text involves fine-tuning the T5 - model for dictation format recognition. The training process included several key steps:
High-quality examples were selected where a person dictated formatted text. These examples included both the audio files of the dictations and the corresponding written documents.
These data were used to test the model's performance and ensure it worked correctly. The results were rigorously tested and validated in the client's day-to-day work environment over several weeks.
Step 4: Hosting
The hosted solution runs in a secure, german cloud environment provided by kraud.cloud. The interface is available to the user via a user friendly custom web app.
Lassen Sie uns besprechen, wie Ihre Daten in Kombination mit Technologien für maschinelles Lernen die Leistung Ihres Unternehmens steigern können.