VoiceInterview Module Design and Implementation
This note records how I implemented the VoiceInterview module in the interview-guide project. The core goal is to make voice interviews deliver a complete experience of real-time interaction, resumable sessions, and traceable evaluation.
Module Capability Overview
- Real-time voice interaction: built on
WebSocket + Qwen3 Voice Model(shared API key for ASR/TTS/LLM). - Streaming experience optimization: sentence-level concurrent TTS, generation/synthesis/playback in parallel, first-packet latency around 200ms.
- Server-side VAD: automatic segmentation with real-time subtitles (including intermediate results).
- Echo protection: supports manual submission to avoid AI playback being captured as user input.
- Session continuity: supports pause/resume and multi-turn context memory, with auto-pause on timeout.
- Observability metrics: Micrometer metrics for TTS/ASR latency, session duration, etc.
State Transitions
Key API Design
POST /api/voice-interview/sessions Create Voice Interview Session
Controller entry:
VoiceInterviewController.createSession(@Valid @RequestBody CreateSessionRequest request)
Core call chain:
voiceInterviewService.createSession(request);
Implementation highlights:
- Fallback
skillId(use default skill when missing). - Fallback
llmProvider(use default provider when empty). - Build
VoiceInterviewSessionEntity(phase switches, difficulty, resume ID, JD text, planned duration, etc.). - Default
userId = "default". - Set initial phase (the first enabled one in
intro/tech/project/hr). - Persist to
voice_interview_sessionsand cache in Redis (with TTL). - Return
SessionResponseDTO(session ID, status, phase, config, etc.).
GET /api/voice-interview/sessions/{sessionId} Get Session Detail by ID
Controller call:
voiceInterviewService.getSessionDTO(sessionId);
Implementation highlights:
- Read Redis first, then DB fallback.
- Build
SessionResponseDTOwhen found. - Return unified error when not found:
Session not found: {sessionId}.
POST /api/voice-interview/sessions/{sessionId}/end End Session and Trigger Async Evaluation
Controller call:
voiceInterviewService.endSession(sessionId.toString());
End + evaluation logic:
session.setEndTime(now);
session.setCurrentPhase(COMPLETED);
session.setStatus(COMPLETED);
session.setEvaluateStatus(PENDING);
sessionRepository.save(session);
voiceEvaluateStreamProducer.sendEvaluateTask(sessionId);
redisService.streamAdd(streamKey(), buildMessage(payload), AsyncTaskStreamConstants.STREAM_MAX_LEN);
Notes:
- API returns
Result.success()immediately without waiting for evaluation completion. - Frontend polls
GET /api/voice-interview/sessions/{sessionId}/evaluationfor progress.
PUT /api/voice-interview/sessions/{sessionId}/pause Pause Session
Core call:
voiceInterviewService.pauseSession(sessionId.toString(), reason);
Implementation highlights:
- Only
IN_PROGRESSsessions can be paused. - Set status to
PAUSED, record reason, updateupdatedAt. - Persist DB and sync Redis cache.
PUT /api/voice-interview/sessions/{sessionId}/resume Resume Session
Core call:
voiceInterviewService.resumeSession(sessionId.toString());
Implementation highlights:
- Only
PAUSEDsessions can be resumed. - After resume, status becomes
IN_PROGRESSwithout resetting phase/progress. - Persist DB, sync Redis, and return latest
SessionResponseDTO.
GET /api/voice-interview/sessions Get Session List (Filter by userId/status)
Call chain:
voiceInterviewService.getAllSessions(userId, status);
sessionRepository.findByUserIdAndStatusOrderByUpdatedAtDesc(userId, statusEnum);
Return:
Result<List<SessionMetaDTO>>
DELETE /api/voice-interview/sessions/{sessionId} Delete Voice Interview Session
Call chain:
voiceInterviewService.deleteSession(sessionId);
Implementation highlights:
- Validate session existence.
- Delete session and related data (messages/evaluation, depending on repository implementation).
- Clear Redis cache.
GET /api/voice-interview/sessions/{sessionId}/messages Get Conversation History
Call chain:
voiceInterviewService.getConversationHistoryDTO(sessionId);
Return:
Result<List<VoiceInterviewMessageDTO>>
GET /api/voice-interview/sessions/{sessionId}/evaluation Get Async Evaluation Status and Result
Implementation highlights:
- Validate session first (throw
VOICE_SESSION_NOT_FOUNDif missing). - Read
evaluateStatusandevaluateError. - If status is
COMPLETED, load evaluation details:
evaluationService.getEvaluation(sessionId);
- Return
VoiceEvaluationStatusDTO(includes status and result when completed).
POST /api/voice-interview/sessions/{sessionId}/evaluation Manually Trigger Async Evaluation
Processing logic:
voiceInterviewService.getSession(sessionId);
evaluationService.getEvaluation(sessionId);
voiceInterviewService.triggerEvaluation(sessionId);
Rules:
- If already
COMPLETED: return existing evaluation result directly. - If
PENDING/PROCESSING: return current status without duplicate triggering. - For other triggerable states: enqueue evaluation task and return
PENDING, then frontend continues polling.
Summary
The key value of the VoiceInterview module is not just making voice interaction work, but making the entire real-time pipeline and session lifecycle robustly connected. For me, only when the full chain (create, pause, resume, end, evaluate) works reliably can voice interviews become a truly evolvable product capability.