Case study

VoiceCrunch AI

Speech Analytics Platform

Speech-to-textNLPSentimentDashboards

Key outcomes

90% less time spent reviewing calls
40+ smart criteria across sentiment, silence and talk dynamics
Flexible monthly and PAYG pricing for call analysis

The Problem

Quality assurance teams in call centres spend an enormous amount of time listening to recordings. A supervisor might review 20–30 calls per day—still a tiny fraction of total volume. Most compliance risks, coaching opportunities, and customer experience signals go undetected because there is simply not enough human time to find them.

Keyword spotting tools exist, but they are rigid: exact match only, no understanding of context, silence, or conversational dynamics.

VoiceCrunch needed to be smarter—a platform that understands calls the way a senior QA analyst would.

What Was Built

VoiceCrunch AI is a speech analytics SaaS that ingests recorded calls and makes them searchable across 40+ criteria, including:

  • Keyword and phrase search — including fuzzy match and phonetic variants
  • Sentiment trajectory — how sentiment shifts across a call, not just a single score
  • Silence detection — long pauses that indicate confusion, hold procedures, or scripting gaps
  • Overtalk analysis — moments where both parties speak simultaneously
  • Volume and pacing — shouting, fast speech, scripted vs natural delivery

These signals are combined into a searchable index. A compliance officer can query "calls with negative sentiment after a pricing discussion AND silence longer than 8 seconds" and get results in seconds rather than hours.

Transcription and Translation

The platform runs automatic transcription on every call using a speech-to-text pipeline optimised for call centre audio (noisy environments, multiple speakers, telephony compression artifacts).

Key moments flagged by the analytics layer are also translated into target languages—useful for multinational operations where QA is centralised but calls happen in multiple languages.

Alerting and Dashboards

The platform sends email alerts for configurable trigger conditions—a high-risk keyword appearing, sentiment dropping below a threshold, overtalk exceeding a percentage of call time. These alerts route to the right team: compliance gets compliance signals, sales management gets sales signals.

Dashboards aggregate signal data by agent, team, and time period—turning individual call analysis into coaching and trend data.

Ingestion Flexibility

One of the harder engineering challenges: call recordings come from everywhere. Legacy Avaya systems, Zoom, Genesys, custom call recorders—each with different audio formats, naming conventions, and metadata structures.

The ingestion layer was built as a modular pipeline: each connector normalises its source into a canonical audio object before hitting the analysis stack. Adding a new source is a new connector, not a change to core logic.

Processing is transient—audio is analysed in-memory and results are written to the database. No raw audio is stored beyond the retention policy.

Key Engineering Decisions

Sentiment trajectory over sentiment score. A single score for a 20-minute call is nearly meaningless. Tracking sentiment in 30-second windows across the call reveals the actual story: where things went wrong, whether the agent recovered, how the call ended.

40+ criteria by design. Each criterion is small and composable. Users combine them in queries. This is more powerful than a set of fixed reports and more maintainable than a monolithic scoring model.

PAYG + monthly pricing. Different customer sizes have different usage patterns. High-volume centres want flat-rate monthly pricing. Smaller teams and one-off use cases pay per analysis. Both are supported from the same infrastructure.

Scope

  • Search recorded calls for words, phrases, silences, overtalk, sentiment and talk-style changes.
  • Automatic transcription and translation of key moments into target languages.
  • Email alerts and dashboards for compliance, CX, sales and risk signals.
  • Silence, overtalk, volume and pacing analysis for coaching and QA.
  • Flexible ingestion from existing call recorders, Zoom and legacy systems with secure, transient processing.
Like what you see?

Waqas Raza

AI-Native Full-Stack Engineer. Top Rated on Upwork · $180K+ earned · 93% job success. I build production AI agents, LLM systems, Web3 platforms, and full-stack applications.

Hire me on Upwork