Skip to content
Forward Share Ventures

How a Product Engineering Team Scaled AI Speech Understanding to 10M+ Utterances

A case from the Forward Share Ventures expert operator network: how a product engineering team scaled an AI speech understanding pipeline from early prototype to 10M+ utterances in production, with measurable accuracy improvements.

Talk to an expert operator
Founder-Vetted ·  STAR-Verified Outcomes · Matched in 48 Hours · Operator-First · Cancel Anytime

A product engineering expert operator from the Forward Share Ventures network helped a voice-AI company scale their speech understanding pipeline from an early prototype processing thousands of utterances per day to a production system handling 10M+ utterances monthly, with a 31% improvement in word error rate and a 60% reduction in per-utterance infrastructure cost over a fourteen-week engagement.

The situation

The company was a Series A voice-AI platform targeting customer service automation for mid-market enterprises. Their core technology was a speech understanding pipeline: real-time transcription, intent classification, and entity extraction for customer service calls. The pipeline had been built by the founding engineering team during the pre-Series A phase and was functional at low volume – handling fifty to a hundred concurrent calls for early design partner customers.

At Series A close, the company signed two enterprise customers with production deployment requirements: a financial services firm projecting two million utterances per month and a healthcare contact center projecting four million utterances per month at peak. The existing pipeline had not been tested at either scale. The founding engineering team had deep expertise in ML model development but limited experience in distributed systems architecture and production ML infrastructure – the specific capability required to scale the pipeline from thousands of utterances to millions.

The problem became visible in week two of enterprise onboarding: the pipeline was producing word error rates of 18–22% on the enterprise customers' domain-specific vocabulary (financial and healthcare terminology), versus 8–10% on the general-vocabulary test data the model had been evaluated on. Latency at enterprise volume was averaging 3.2 seconds per utterance, versus the 1.5-second SLA in the customer contracts. The engineering team was spending sixty to seventy percent of their time on production incident response rather than on the model improvements that would fix the underlying accuracy problem.

The CEO engaged Forward Share Ventures to identify a product engineering expert operator with specific experience in production ML infrastructure and voice AI systems at enterprise scale. The match was made based on three verified STAR cases: one scaling a conversational AI pipeline from prototype to ten million queries per month, one domain adaptation project for a healthcare NLP system, and one ML infrastructure redesign that reduced per-inference cost by fifty-three percent.

The action

The expert operator began with a two-week architecture audit covering the full pipeline: the speech-to-text layer, the intent classification model, the entity extraction system, and the infrastructure layer connecting them. The audit identified three root causes for the performance gap: the speech-to-text model was a general-vocabulary model with no domain adaptation for financial or healthcare terminology, the inference infrastructure was a single-region deployment with no horizontal scaling capability, and the data pipeline feeding production feedback back to the model training process was broken – meaning the model had not been updated with any of the production data from the six months since the company's last model release.

The expert operator produced a fourteen-week rebuild plan with four parallel workstreams. The first workstream was domain adaptation: fine-tuning the speech recognition model on a curated dataset of financial and healthcare terminology, using a combination of the company's existing production data and augmented synthetic data. The second workstream was infrastructure redesign: moving from a single-region monolithic inference deployment to a distributed inference architecture with auto-scaling, prioritizing the latency reduction required by the customer SLAs. The third workstream was data pipeline repair: rebuilding the feedback loop from production to training so that every week's production data was available for the next model update cycle. The fourth workstream was team enablement: pair engineering with two members of the founding team on the infrastructure redesign, so the internal team would own the architecture rather than depending on the expert operator for ongoing maintenance.

The domain adaptation fine-tuning ran in weeks three through six, with the first domain-adapted model deployed to a shadow traffic environment in week seven. The infrastructure redesign ran in weeks two through eight, with the new distributed inference layer deployed to production in week nine. The data pipeline was repaired in week four and began accumulating production data for the subsequent model update cycle. The expert operator ran weekly architecture reviews with the founding team to transfer the design reasoning for each infrastructure decision, producing a documented architecture decision record that the team could reference without the expert operator present.

The result

By week twelve of the engagement, the domain-adapted model on the new infrastructure was processing production traffic for both enterprise customers. Word error rate on financial vocabulary improved from 20% to 13.8% – a 31% improvement. Word error rate on healthcare vocabulary improved from 22% to 14.1%. Average utterance latency dropped from 3.2 seconds to 0.94 seconds – within the 1.5-second SLA with significant margin. The distributed infrastructure was handling the combined enterprise volume of six million utterances per month with auto-scaling absorbing peak loads without manual intervention. Per-utterance infrastructure cost declined 60% from the original monolithic architecture due to improved resource utilization. Both enterprise customers confirmed continued deployment; the financial services firm expanded their contract at the end of the engagement period.

The team enablement workstream produced a measurable transfer: by week fourteen, two members of the founding engineering team were operating the distributed infrastructure independently and had completed the first post-engagement model update cycle without expert operator involvement. The expert operator exited the engagement with documented architecture decision records for each major design choice, a runbook for the distributed inference system, and a data pipeline monitoring dashboard the team operated from the first week of independent operation.

About this case

What is the STAR Portfolio™?

The STAR Portfolio™ is Forward Share Ventures' system for collecting, verifying, and organizing documented outcome cases from expert operators in the network. Every expert operator submits a minimum of three STAR cases – Situation, Action, Result – as part of the vetting process. Cases are verified through reference conversations with the executives and founders who can confirm the outcomes described. The STAR Portfolio™ is the primary matching tool: when a client situation is diagnosed, the matching process identifies the expert operators whose verified STAR cases are most comparable to the client's specific situation, stage, and constraint profile. Cases in the portfolio represent verified outcomes, not self-reported experience.

How does Forward Share Ventures verify case outcomes?

Verification occurs through structured reference conversations with the CEOs, founders, or executives who worked directly with the expert operator during the engagement. The verification conversation focuses on three questions: Was the situation described in the STAR case accurate? Did the operator take the specific actions described? Do the results match what the reference observed, with specific numbers? References are outcome verifiers rather than character witnesses – the conversation is focused on the facts of the STAR case rather than on general endorsement of the operator. Cases where verification reveals a material discrepancy between the documented case and the reference's account are reviewed and, where necessary, corrected or excluded from the portfolio.

How do I work with expert operators from the Forward Share Ventures network?

The engagement process starts with a diagnostic session – a twenty-minute conversation with a Forward Share Ventures team member focused on your specific functional gap, current stage, and constraints. The diagnostic produces a recommended expert operator profile and identifies the two to four network members whose verified STAR cases are most comparable to your situation. You review the STAR cases and references for each match, meet with one to two candidates, and make a go-decision. The expert operator is typically in seat within ten business days of a go-decision. Engagements are scoped to the specific deliverable – a sales playbook, a financial model, a people infrastructure build – rather than to an open-ended retainer, with a defined transition plan for when the engagement ends.

Find Your Expert in 48 Hours.

Founder-Vetted. Matched in 48 Hours. STAR-Verified.

Talk to an expert operator Browse Experts →