---

Audio Analysis Plug-in

Developed at Cambridge University around a powerful and extensible natural language processing framework, Autonomy Virage’s SoftSound audio plug-ins combine four unique audio processing components to automatically generate keyword, speaker identification, transcript alignment, and audio classification indexes from the audio signal.

By intelligently “listening” to the video’s audio track, the software identifies spoken words, speaker names and audio types, reducing the expensive and labor-intensive manual annotation process traditionally used to log video and audio. These plug-ins provide significant value to any organization that creates or distributes large volumes of content by dramatically reducing the time spent searching for segments.

  • State-of-the-art accuracy
  • Support of phonetics, word spotting and conceptual indexing and search
  • Highly scalable real-time recognition
  • Unlimited vocabulary size
  • Industry-leading range of supported languages
  • Audio processing capabilities
  • Speaker / audio segmentation
  • Speaker / audio classification

These allow for the identification of:

  • Topics being discussed
  • Genders of the speakers
  • Emotional tone of speech
  • Amount and location of speech versus non-speech (e.g. background noise or silence)
  • Linguistic origin of speakers
  • Music

 

SoftSound

Autonomy Virage performs and fully supports all forms of searching, including phoneme and word spotting techniques. In addition, Autonomy Virage offers conceptual searching, which utilizes the sophisticated mathematical techniques built into its patented IDOL technology to identify and understand the conceptual relations between words that determine their meaning. Autonomy Virage’s advanced interaction analysis uses automated meaning-based speech recognition to derive a hypothesis of the concepts in speech to analyze things such as: topics being discussed, identity and gender of speakers, emotions expressed within a conversation, and presence of excessive silence or cross-talk. The advanced technology of IDOL enables users to search audio, email, and chat data from numerous sources using multilingual natural language query. This technology combines both phonetic and conceptual methods to overcome the limitations of keyword and phoneme matching approaches. Moreover, Autonomy Virage’s conceptual technology takes into account the variable, complex, and fundamentally human aspects of language that basic phonetic approaches cannot accommodate.

Key Benefits:

  • Retrieves information according to its meaning
  • Combines phonetic and conceptual approaches to offer unique high-level functions
  • Allows audio, video, email, and chat to be indexed and searched
  • Provides unrivalled accuracy by understanding words in context
  • Patented search technology drastically reduces CPU and memory requirements
  • Automatic customization via additional text material
  • Fixed-latency real-time mode for media monitoring operations

 

Audio Segmentation

The powerful audio segmentation plug-in identifies unique sounds in the audio signal and registers exactly where they occur. Users can easily program it to recognize specific audio types such as; applause, laughter, program jingles or tones.

 

Speaker Recognition

The speaker identification plug-in recognizes voices from a user-defined library, regardless of the words or even the language spoken. By simply providing a short speech sample, users can easily add new speakers to the library. Speaker identification makes it possible to associate blocks of text with a particular speaker, enhancing rich media navigation and retrieval.

 

Speech-to-Text

The SoftSound Audio Analysis Plug-in detects speech and performs realtime speech-to-text transcription with an approximate 90% accuracy rate. The text track is then synchronized with the video track, allowing for accurate video search.