AvaCapoSDK~ TalkingAvatar

High-level controller that loads a 3D character, manages scene/audio, lip-sync, poses, gestures, emojis, moods, and the speech queue for a talking avatar.

The class wires together the SceneManager, AudioManager, pose/gesture/emoji animations, morph (blendshape) targets, and the lipsync pipeline.

Constructor

new TalkingAvatar(node, optsopt)

Parameters:
NameTypeAttributesDefaultDescription
nodeHTMLElement

Container element for the renderer. Usually resolved via parent SDK object.

optsTalkingAvatar.TalkingAvatarOptions<optional>
null

Avatar configuration (scene, audio, TTS, etc.).

Properties
NameTypeDescription
nodeHTMLElement

Container DOM node for the avatar renderer.

optsTalkingAvatar.TalkingAvatarOptions

Current avatar configuration options.

audioAudioManager

Instance of AudioManager used by the avatar.

sceneManagerSceneManager

Instance of SceneManager used by the avatar.

ttsOptsTalkingAvatar.TTSOptions

Text-to-speech options used by the avatar.

characterUrlObject

character url object.

Example
const avatar = new AvaCapoSDK.TalkingAvatar(null, {
  characterUrl: 'link_to_the_model', 
  lipsyncLang: 'en',
  cameraView: 'full',
};

Members

(static, constant) exports.TTS_DEFAULTS :TalkingAvatar.TTSOptions

Default values used to initialize TalkingAvatar.TTSOptions.

Methods

(async) animateVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Animate visemes without audio (forces animation-only mode).

Parameters:
NameTypeAttributesDefaultDescription
inputObject

Same shape as in speakWithVisemes; audio is ignored.

optsObject<optional>
{}

Builder overrides.

onSubtitlesfunction<optional>
null

Subtitles callback.

Returns:
Type: 
Promise.<void>

(async) animateWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Animate from words without audio (forces animation-only mode). Derives visemes from words via lipsync bridge and plays animation only.

Parameters:
NameTypeAttributesDefaultDescription
inputObject

Same shape as in speakWithWords; audio is ignored.

optsObject<optional>
{}

Builder overrides; supports { lipsyncLang?:string } among others.

onSubtitlesfunction<optional>
null

Subtitles callback.

Returns:
Type: 
Promise.<void>

(async) dispose() → {Promise.<void>}

Dispose the avatar and free resources.

Returns:
Type: 
Promise.<void>

getCameraView() → {string}

Get current camera view. One of 'full', 'mid', 'torso', 'head'.

Returns:

View name.

Type: 
string

getCameraViewNames() → {Array.<string>}

Get avaliable camera view names: 'full', 'mid', 'torso', 'head'.

Returns:

Supported view names.

Type: 
Array.<string>

(async) loadAsync(onProgressopt) → {Promise.<void>}

Loader for 3D avatar model.

Parameters:
NameTypeAttributesDefaultDescription
onProgressprogressfn<optional>
null

Callback for progress

Returns:
Type: 
Promise.<void>

(async) say(r, optsopt, onSubtitlesopt) → {Promise.<void>}

Smart router that chooses an appropriate playback path based on provided fields:

  • audio + visemes → speakWithVisemes
  • audio + words → speakWithWords
  • audio only → speakAudio
  • words only → animateWords
  • visemes only → animateVisemes
  • text/ssml → speakText (external TTS path)
Parameters:
NameTypeAttributesDefaultDescription
rObject

Mixed input; see dedicated methods for exact shapes.

optsObject<optional>
{}

Optional overrides (e.g., lipsyncLang, trim options).

onSubtitlesfunction<optional>
null

Subtitles callback.

Returns:
Type: 
Promise.<void>

setCameraView(view, opts)

Set camera view preset for the avatar. This will position the camera to frame the avatar based on the chosen preset.

Parameters:
NameTypeDefaultDescription
viewstring

one of 'full', 'mid', 'torso', 'head'.

optsSceneManager.SceneManagerOptionsnull

optional camera overrides.

setLipsyncLanguage(lang)

Set lypsink language for the avatar

Parameters:
NameTypeDescription
langstring

Language for lypsink engine

(async) speakAudio(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio with optional timeline-only animation/markers (no viseme derivation). Suitable for audio-only or audio + markers cases.

Parameters:
NameTypeAttributesDefaultDescription
inputObject
Properties
NameTypeAttributesDescription
audioAudioBuffer | ArrayBuffer | Array.<ArrayBuffer>

Decoded or encoded audio.

sampleRatenumber<optional>

Sample rate when audio is raw PCM.

markersObject<optional>

Marker timeline: { labels:string[], times:number[], timeUnit?:string }.

animObject<optional>

Full animation object to add.

modestring<optional>

Build mode: "auto" | "audio" | "anim".

totalDurationMsnumber<optional>

Duration cap used when no audio is present.

optsObject<optional>
{}

Builder overrides (trimStartMs, trimEndMs, etc.).

onSubtitlesfunction<optional>

Callback invoked by the renderer for subtitle updates.

Returns:
Type: 
Promise.<void>

(async) speakEmoji(em)

Add emoji to speech queue.

Parameters:
NameTypeDescription
emstring

Emoji.

(async) speakPause(t)

Add a pause to the speech queue.

Parameters:
NameTypeDescription
tnumber

Duration in milliseconds.

(async) speakText(s, optsopt, onSubtitlesopt, excludesopt) → {Promise.<void>}

Speak text using the adapter-driven TTS pipeline. Behavior:

  • The input text is tokenized into sentences, words, emojis, and breaks.
  • Non-speech chunks (breaks, emojis) are enqueued as-is.
  • Speech chunks are passed to the active TTS adapter for synthesis.
  • Word timings and visemes are taken from the adapter when available; otherwise they are derived automatically from the synthesized audio.
  • Supports cancellation via AbortController: calling abortActive() will stop the current synthesis request.
  • Playback begins automatically once items are enqueued.
Parameters:
NameTypeAttributesDefaultDescription
sstring

Text to synthesize.

optsObject<optional>
null

Synthesis options (same shape as ttsOpts).

Properties
NameTypeAttributesDefaultDescription
batchModestring<optional>
'sentence'

'sentence' (default: each sentence is synthesized separately) or 'all' (entire text in one request).

timeUnitstring<optional>
'ms'

Time unit for timings ('ms' or 's').

ttsVoicestring<optional>

Voice identifier to use for synthesis.

ttsRatenumber<optional>
1

Speaking rate multiplier.

ttsPitchnumber<optional>
0

Pitch adjustment.

ttsVolumenumber<optional>
0

Volume adjustment in dB.

avatarMuteboolean<optional>
false

If true, generate only animation/subtitles, no audio.

avatarMoodstring<optional>

Optional mood tag to attach to playback items.

adapterConfigObject<optional>

Extra configuration object passed directly to the TTS adapter.

onSubtitlesfunction<optional>
null

Callback invoked when subtitle clips are enqueued.

excludesArray.<Array.<number>><optional>
null

Array of [start, end] index ranges to skip during speech.

Returns:

Resolves when synthesis and enqueuing complete, or rejects if synthesis fails or the operation is aborted.

Type: 
Promise.<void>

(async) speakWithVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio aligned to an explicit viseme timeline. Interprets viseme starts as START times; durations are explicit. If input.words is present and onSubtitles is provided, word tokens are also emitted as subtitle clips.

Parameters:
NameTypeAttributesDefaultDescription
inputObject
Properties
NameTypeAttributesDescription
visemesObject

Viseme timeline: { labels:string[], starts:number[], durations:number[], timeUnit?:string }.

audioAudioBuffer | ArrayBuffer | Array.<ArrayBuffer><optional>

Optional audio to play.

sampleRatenumber<optional>

Sample rate when audio is raw PCM.

markersObject<optional>

Marker timeline: { labels:string[], times:number[], timeUnit?:string }.

animObject<optional>

Full animation object to add.

modestring<optional>

Build mode: "auto" | "audio" | "anim".

totalDurationMsnumber<optional>

Duration cap used when no audio is present.

optsObject<optional>
{}

Builder overrides.

onSubtitlesfunction<optional>

Subtitles callback.

Returns:
Type: 
Promise.<void>

(async) speakWithWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio aligned to word-level timings. Derives visemes from words via lipsync bridge. Can also run in animation-only mode when input.mode === "anim" or audio is absent. Word times are interpreted as START times; durations are explicit.

Parameters:
NameTypeAttributesDefaultDescription
inputObject
Properties
NameTypeAttributesDescription
wordsObject

Words timeline: { tokens:string[], starts:number[], durations:number[], timeUnit?:string }.

audioAudioBuffer | ArrayBuffer | Array.<ArrayBuffer><optional>

Optional audio to play.

sampleRatenumber<optional>

Sample rate when audio is raw PCM.

markersObject<optional>

Marker timeline: { labels:string[], times:number[], timeUnit?:string }.

animObject<optional>

Full animation object to add.

modestring<optional>

Build mode: "auto" | "audio" | "anim".

totalDurationMsnumber<optional>

Duration cap used when no audio is present.

optsObject<optional>
{}

Builder overrides; supports { lipsyncLang?:string } among others.

onSubtitlesfunction<optional>

Subtitles callback; when provided, word tokens are also emitted as subtitle clips.

Returns:
Type: 
Promise.<void>

stopSpeaking()

Stop speaking completely: stop current source, clear playlist and the full speechQueue. This is a hard reset for TTS/playback.

Type Definitions

TTSOptions

Options for the text-to-speech provider used by the avatar. These are merged over TTS_DEFAULTS and may be passed via TalkingAvatarOptions#ttsOpts.

Type:
  • Object
Properties
NameTypeAttributesDefaultDescription
endpointstring<optional>
<nullable>
null

Base URL of the TTS service.

apiKeystring<optional>
<nullable>
null

API key used for the TTS service (if applicable).

trimObject<optional>
{start:0,end:400}

Client-side trim in milliseconds.

langstring<optional>
'en'

BCP-47 language tag for synthesis (e.g., 'en-US' or 'en').

voicestring<optional>
<nullable>
null

Voice model/name identifier.

ratenumber<optional>
1

Playback/synthesis rate multiplier.

pitchnumber<optional>
0

Pitch shift (service-dependent scale).

volumenumber<optional>
0

Output gain (service-dependent scale).

jwtGetfunction<optional>

Optional async getter for JWT auth token.

TalkingAvatarOptions

Global options for TalkingAvatar. These map 1:1 to this.opts with safe defaults applied in the constructor.

Type:
  • Object
Properties
NameTypeAttributesDefaultDescription
characterUrlstring | Object

Absolute or relative URL to the avatar GLB/FBX model. Required.

lipsyncLangstring<optional>
'en'

Primary lipsync language code used for TTS alignment and streaming.

modelRootstring<optional>
'Armature'

Name of the skeleton root in the avatar file.

cameraView'full' | 'head' | 'torso'<optional>
'full'

Initial camera framing preset.

dracoEnabledboolean<optional>
false

Enable Draco-compressed geometry decoding.

dracoDecoderPathstring<optional>
'https://www.gstatic.com/draco/v1/decoders/'

Path to Draco decoders.

customUpdatefunction<optional>
null

Per-frame hook called after internal update; receives dt in ms.

sceneManagerSceneManager<optional>
null

Optional external SceneManager instance to reuse an existing scene.

sceneOptsSceneManager.SceneManagerOptions<optional>
{}

Scene override options merged with SCENE_DEFAULTS.

audioAudioManager.AudioManagerOptions<optional>
{}

Audio override options merged with AUDIO_DEFAULTS.

ttsOptsTalkingAvatar.TTSOptions<optional>
{}

Text-to-speech provider options.

markerfn()

Callback when the speech queue processes this marker item.

progressfn(url, event)

Loading progress.

Parameters:
NameTypeDescription
urlstring

URL of the resource being loaded.

eventObject

Progress event data.

subtitlesfn(node)

Callback when new subtitles have been written to the DOM node.

Parameters:
NameTypeDescription
nodeHTMLElement

Target DOM node.