AI Resume Analysis: Voice Interview Module

VoiceInterview Module Design and Implementation

This note records how I implemented the VoiceInterview module in the interview-guide project. The core goal is to make voice interviews deliver a complete experience of real-time interaction, resumable sessions, and traceable evaluation.

Module Capability Overview

Real-time voice interaction: built on WebSocket + Qwen3 Voice Model (shared API key for ASR/TTS/LLM).
Streaming experience optimization: sentence-level concurrent TTS, generation/synthesis/playback in parallel, first-packet latency around 200ms.
Server-side VAD: automatic segmentation with real-time subtitles (including intermediate results).
Echo protection: supports manual submission to avoid AI playback being captured as user input.
Session continuity: supports pause/resume and multi-turn context memory, with auto-pause on timeout.
Observability metrics: Micrometer metrics for TTS/ASR latency, session duration, etc.

State Transitions

flowchart TD
A["Create Session
POST /api/voice-interview/sessions"] --> B["IN_PROGRESS"]

B --> C{"Session Events"}
C -- "Pause / Timeout" --> D["PAUSED"]
D -- "Resume" --> B

C -- "End Interview" --> E["COMPLETED"]
E --> F["evaluateStatus = PENDING"]
F --> G["evaluateStatus = PROCESSING"]

G --> H{"Evaluation Result"}
H -- "Success" --> I["EVALUATED
evaluateStatus = COMPLETED"]
H -- "Failure" --> J["evaluateStatus = FAILED"]

B --> K["DELETE /api/voice-interview/sessions/{id}"]
D --> K
E --> K
I --> K
J --> K

Key API Design

`POST /api/voice-interview/sessions` Create Voice Interview Session

Controller entry:

VoiceInterviewController.createSession(@Valid @RequestBody CreateSessionRequest request)

Core call chain:

voiceInterviewService.createSession(request);

Implementation highlights:

Fallback skillId (use default skill when missing).
Fallback llmProvider (use default provider when empty).
Build VoiceInterviewSessionEntity (phase switches, difficulty, resume ID, JD text, planned duration, etc.).
Default userId = "default".
Set initial phase (the first enabled one in intro/tech/project/hr).
Persist to voice_interview_sessions and cache in Redis (with TTL).
Return SessionResponseDTO (session ID, status, phase, config, etc.).

`GET /api/voice-interview/sessions/{sessionId}` Get Session Detail by ID

Controller call:

voiceInterviewService.getSessionDTO(sessionId);

Implementation highlights:

Read Redis first, then DB fallback.
Build SessionResponseDTO when found.
Return unified error when not found: Session not found: {sessionId}.

`POST /api/voice-interview/sessions/{sessionId}/end` End Session and Trigger Async Evaluation

Controller call:

voiceInterviewService.endSession(sessionId.toString());

End + evaluation logic:

session.setEndTime(now);
session.setCurrentPhase(COMPLETED);
session.setStatus(COMPLETED);
session.setEvaluateStatus(PENDING);
sessionRepository.save(session);
voiceEvaluateStreamProducer.sendEvaluateTask(sessionId);
redisService.streamAdd(streamKey(), buildMessage(payload), AsyncTaskStreamConstants.STREAM_MAX_LEN);

Notes:

API returns Result.success() immediately without waiting for evaluation completion.
Frontend polls GET /api/voice-interview/sessions/{sessionId}/evaluation for progress.

`PUT /api/voice-interview/sessions/{sessionId}/pause` Pause Session

Core call:

voiceInterviewService.pauseSession(sessionId.toString(), reason);

Implementation highlights:

Only IN_PROGRESS sessions can be paused.
Set status to PAUSED, record reason, update updatedAt.
Persist DB and sync Redis cache.

`PUT /api/voice-interview/sessions/{sessionId}/resume` Resume Session

Core call:

voiceInterviewService.resumeSession(sessionId.toString());

Implementation highlights:

Only PAUSED sessions can be resumed.
After resume, status becomes IN_PROGRESS without resetting phase/progress.
Persist DB, sync Redis, and return latest SessionResponseDTO.

`GET /api/voice-interview/sessions` Get Session List (Filter by userId/status)

Call chain:

voiceInterviewService.getAllSessions(userId, status);
sessionRepository.findByUserIdAndStatusOrderByUpdatedAtDesc(userId, statusEnum);

Return:

Result<List<SessionMetaDTO>>

`DELETE /api/voice-interview/sessions/{sessionId}` Delete Voice Interview Session

Call chain:

voiceInterviewService.deleteSession(sessionId);

Implementation highlights:

Validate session existence.
Delete session and related data (messages/evaluation, depending on repository implementation).
Clear Redis cache.

`GET /api/voice-interview/sessions/{sessionId}/messages` Get Conversation History

Call chain:

voiceInterviewService.getConversationHistoryDTO(sessionId);

Return:

Result<List<VoiceInterviewMessageDTO>>

`GET /api/voice-interview/sessions/{sessionId}/evaluation` Get Async Evaluation Status and Result

Implementation highlights:

Validate session first (throw VOICE_SESSION_NOT_FOUND if missing).
Read evaluateStatus and evaluateError.
If status is COMPLETED, load evaluation details:

evaluationService.getEvaluation(sessionId);

Return VoiceEvaluationStatusDTO (includes status and result when completed).

`POST /api/voice-interview/sessions/{sessionId}/evaluation` Manually Trigger Async Evaluation

Processing logic:

voiceInterviewService.getSession(sessionId);
evaluationService.getEvaluation(sessionId);
voiceInterviewService.triggerEvaluation(sessionId);

Rules:

If already COMPLETED: return existing evaluation result directly.
If PENDING/PROCESSING: return current status without duplicate triggering.
For other triggerable states: enqueue evaluation task and return PENDING, then frontend continues polling.

Summary

The key value of the VoiceInterview module is not just making voice interaction work, but making the entire real-time pipeline and session lifecycle robustly connected. For me, only when the full chain (create, pause, resume, end, evaluate) works reliably can voice interviews become a truly evolvable product capability.

VoiceInterview Module Design and Implementation

Module Capability Overview

State Transitions

Key API Design

POST /api/voice-interview/sessions Create Voice Interview Session

GET /api/voice-interview/sessions/{sessionId} Get Session Detail by ID

POST /api/voice-interview/sessions/{sessionId}/end End Session and Trigger Async Evaluation

PUT /api/voice-interview/sessions/{sessionId}/pause Pause Session

PUT /api/voice-interview/sessions/{sessionId}/resume Resume Session

GET /api/voice-interview/sessions Get Session List (Filter by userId/status)

DELETE /api/voice-interview/sessions/{sessionId} Delete Voice Interview Session

GET /api/voice-interview/sessions/{sessionId}/messages Get Conversation History

GET /api/voice-interview/sessions/{sessionId}/evaluation Get Async Evaluation Status and Result

POST /api/voice-interview/sessions/{sessionId}/evaluation Manually Trigger Async Evaluation

Summary

`POST /api/voice-interview/sessions` Create Voice Interview Session

`GET /api/voice-interview/sessions/{sessionId}` Get Session Detail by ID

`POST /api/voice-interview/sessions/{sessionId}/end` End Session and Trigger Async Evaluation

`PUT /api/voice-interview/sessions/{sessionId}/pause` Pause Session

`PUT /api/voice-interview/sessions/{sessionId}/resume` Resume Session

`GET /api/voice-interview/sessions` Get Session List (Filter by userId/status)

`DELETE /api/voice-interview/sessions/{sessionId}` Delete Voice Interview Session

`GET /api/voice-interview/sessions/{sessionId}/messages` Get Conversation History

`GET /api/voice-interview/sessions/{sessionId}/evaluation` Get Async Evaluation Status and Result

`POST /api/voice-interview/sessions/{sessionId}/evaluation` Manually Trigger Async Evaluation