GPT-basierter Transkriptionsdienst

In the legal industry, where precision and efficiency are crucial, transcription services are essential. Learn how our AI-powered solution enhanced the transcription process for a leading legal tech startup, providing improved accuracy and productivity.

KEY BENEFITS

HIGH ACCURACY: Our solution significantly improves transcription accuracy for diverse accents and has achieved a 33% error reduction in comparison to the biggest competitor in the market. 

TECHNICAL JARGON RECOGNITION: The system accurately recognizes specialized legal vocabulary, ensuring high-quality, context-specific transcriptions.

ENHANCED FEATURES AND INCREASED PRODUCTIVITY: With voice-based formatting command recognition, legal professionals can create entire documents using dictation without extra steps. These features significantly boost productivity and cost-effectiveness by reducing manual corrections and allowing professionals to focus on critical tasks.

‍THE CUSTOMER

Our client is an innovative startup from the Rhineland, dedicated to developing digital software solutions for lawyers, by lawyers.

THE CHALLENGE

Handling Technical and Industry-Specific Vocabulary in German:
Legal conferences and documents often include technical terms and jargon specific to the legal industry. Our client was in need of a dependable and effective transcription service to create dictated documents faster and with more fine grained voice-based controls over formatting. 

Real-Time Transcription Solution
The solution needed to be both fast and efficient, while ensuring the confidentiality and security of all recorded conferences and transcriptions.

Legal Text Specificity
GPT models are not specifically trained on legal text and often lack the necessary legal knowledge and expertise. After transcribing the delivered text contains written voice commands that might also hold meaning within a regular sentence structure (e.g. the word “Absatz” meaning new paragraph, can also mean sales figures of a company in a different context.).

Duplication and hallucinations caused by Whisper
At times, there were duplicated parts where every second sentence was an exact copy of the previous one.

THE SOLUTION

Step 1: Deciding on the Technology OpenAI’s Whisper is currently the best speech-to-text model available to the public. However, it does not handle text formatting by itself, which is necessary for a full-fledged dictation solution. Therefore, the pretrained text-to-text transformer model T5 was chosen and iteratively fine-tuned to translate dictated formatting commands into actual text formatting.

Step 2: Transcription with Whisper The Whisper model is used to automatically transcribe the audio files. The resulting text contains voice commands in written words, interlaced with sentences. This transcribed text is then passed through the fine-tuned T5 language model.

Step 3: Development The process of converting transcribed text into formatted text involves fine-tuning the T5 - model for dictation format recognition. The training process included several key steps:

  1. Model Training: A collection of domain-relevant formatted texts was curated to fine-tune T5 for recognizing formatting commands within regular text. For effective learning, each dictation command had to occur frequently enough in the text for the model to recognize and translate it into formatted text. Through this, the model learned the structure and nuances of formatted text and legal jargon.
  1. Evaluation and Benchmarking: After training, the model's performance was evaluated to identify any remaining issues. This process involved benchmarking the model against a set of standards to ensure it accurately recognized and transcribed the dictation format. We tested several Key Performance Indicators and ultimately decided to use the Character Error Rate as our primary metric. This metric provided a clear and precise measure of the model's accuracy in recognizing and transcribing the dictated text.

            High-quality examples were selected where a person dictated formatted text. These examples included both the audio files of the dictations             and the corresponding written documents. 

            These data were used to test the model's performance and ensure it worked correctly. The results were rigorously tested and validated in the             client's day-to-day work environment over several weeks.

  1. Adjustment and Fine-Tuning: Based on the evaluation results, the training data was adjusted by providing additional examples and refining existing ones. This iterative process continued until the model achieved the desired accuracy in recognizing and transcribing dictations.

Step 4: Hosting

The hosted solution runs in a secure, german cloud environment provided by kraud.cloud. The interface is available to the user via a user friendly custom web app. 

Johannes Hollmann

CEO/Gründer

Sie planen ein KI-Projekt?

Lassen Sie uns besprechen, wie Ihre Daten in Kombination mit Technologien für maschinelles Lernen die Leistung Ihres Unternehmens steigern können.

Nehmen Sie Kontakt auf!