Class: TalkingAvatar

Constructor

new TalkingAvatar(node, optsopt)

Parameters:

Name	Type	Attributes	Default	Description
`node`	HTMLElement			Container element for the renderer. Usually resolved via parent SDK object.
`opts`	TalkingAvatar.TalkingAvatarOptions	<optional>	null	Avatar configuration (scene, audio, TTS, etc.).

Properties

Name	Type	Description
`node`	HTMLElement	Container DOM node for the avatar renderer.
`opts`	TalkingAvatar.TalkingAvatarOptions	Current avatar configuration options.
`audio`	AudioManager	Instance of AudioManager used by the avatar.
`sceneManager`	SceneManager	Instance of SceneManager used by the avatar.
`ttsOpts`	TalkingAvatar.TTSOptions	Text-to-speech options used by the avatar.
`characterUrl`	Object	character url object.

Example

const avatar = new AvaCapoSDK.TalkingAvatar(null, {
  characterUrl: 'link_to_the_model', 
  lipsyncLang: 'en',
  cameraView: 'full',
};

Members

(static, constant) exports.TTS_DEFAULTS :TalkingAvatar.TTSOptions

Default values used to initialize TalkingAvatar.TTSOptions.

Type:

TalkingAvatar.TTSOptions

Methods

(async) animateVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Animate visemes without audio (forces animation-only mode).

Parameters:

Name	Type	Attributes	Default	Description
`input`	Object			Same shape as in `speakWithVisemes`; `audio` is ignored.
`opts`	Object	<optional>	{}	Builder overrides.
`onSubtitles`	function	<optional>	null	Subtitles callback.

Returns:

Type:: Promise.<void>

(async) animateWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Animate from words without audio (forces animation-only mode). Derives visemes from words via lipsync bridge and plays animation only.

Parameters:

Name	Type	Attributes	Default	Description
`input`	Object			Same shape as in `speakWithWords`; `audio` is ignored.
`opts`	Object	<optional>	{}	Builder overrides; supports { lipsyncLang?:string } among others.
`onSubtitles`	function	<optional>	null	Subtitles callback.

Returns:

Type:: Promise.<void>

(async) dispose() → {Promise.<void>}

Dispose the avatar and free resources.

Returns:

Type:: Promise.<void>

getCameraView() → {string}

Get current camera view. One of 'full', 'mid', 'torso', 'head'.

Returns:

View name.

Type:: string

getCameraViewNames() → {Array.<string>}

Get avaliable camera view names: 'full', 'mid', 'torso', 'head'.

Returns:

Supported view names.

Type:: Array.<string>

(async) loadAsync(onProgressopt) → {Promise.<void>}

Loader for 3D avatar model.

Parameters:

Name	Type	Attributes	Default	Description
`onProgress`	progressfn	<optional>	null	Callback for progress

Returns:

Type:: Promise.<void>

(async) say(r, optsopt, onSubtitlesopt) → {Promise.<void>}

Smart router that chooses an appropriate playback path based on provided fields:

audio + visemes → speakWithVisemes
audio + words → speakWithWords
audio only → speakAudio
words only → animateWords
visemes only → animateVisemes
text/ssml → speakText (external TTS path)

Parameters:

Name	Type	Attributes	Default	Description
`r`	Object			Mixed input; see dedicated methods for exact shapes.
`opts`	Object	<optional>	{}	Optional overrides (e.g., lipsyncLang, trim options).
`onSubtitles`	function	<optional>	null	Subtitles callback.

Returns:

Type:: Promise.<void>

setCameraView(view, opts)

Set camera view preset for the avatar. This will position the camera to frame the avatar based on the chosen preset.

Parameters:

Name	Type	Default	Description
`view`	string		one of 'full', 'mid', 'torso', 'head'.
`opts`	SceneManager.SceneManagerOptions	null	optional camera overrides.

setLipsyncLanguage(lang)

Set lypsink language for the avatar

Parameters:

Name	Type	Description
`lang`	string	Language for lypsink engine

(async) speakAudio(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio with optional timeline-only animation/markers (no viseme derivation). Suitable for audio-only or audio + markers cases.

Parameters:

Name Type Attributes Default Description

input

Object

Properties

Name	Type	Attributes	Description
`audio`	AudioBuffer \| ArrayBuffer \| Array.<ArrayBuffer>		Decoded or encoded audio.
`sampleRate`	number	<optional>	Sample rate when `audio` is raw PCM.
`markers`	Object	<optional>	Marker timeline: { labels:string[], times:number[], timeUnit?:string }.
`anim`	Object	<optional>	Full animation object to add.
`mode`	string	<optional>	Build mode: "auto" \| "audio" \| "anim".
`totalDurationMs`	number	<optional>	Duration cap used when no audio is present.

opts

Object

{}

Builder overrides (trimStartMs, trimEndMs, etc.).

onSubtitles

function

Callback invoked by the renderer for subtitle updates.

Returns:

Type:: Promise.<void>

(async) speakEmoji(em)

Add emoji to speech queue.

Parameters:

Name	Type	Description
`em`	string	Emoji.

(async) speakPause(t)

Add a pause to the speech queue.

Parameters:

Name	Type	Description
`t`	number	Duration in milliseconds.

(async) speakText(s, optsopt, onSubtitlesopt, excludesopt) → {Promise.<void>}

Speak text using the adapter-driven TTS pipeline. Behavior:

The input text is tokenized into sentences, words, emojis, and breaks.
Non-speech chunks (breaks, emojis) are enqueued as-is.
Speech chunks are passed to the active TTS adapter for synthesis.
Word timings and visemes are taken from the adapter when available; otherwise they are derived automatically from the synthesized audio.
Supports cancellation via AbortController: calling abortActive() will stop the current synthesis request.
Playback begins automatically once items are enqueued.

Parameters:

Name Type Attributes Default Description

s

string

Text to synthesize.

opts

Object

null

Synthesis options (same shape as ttsOpts).

Properties

Name	Type	Attributes	Default	Description
`batchMode`	string	<optional>	'sentence'	`'sentence'` (default: each sentence is synthesized separately) or `'all'` (entire text in one request).
`timeUnit`	string	<optional>	'ms'	Time unit for timings (`'ms'` or `'s'`).
`ttsVoice`	string	<optional>		Voice identifier to use for synthesis.
`ttsRate`	number	<optional>	1	Speaking rate multiplier.
`ttsPitch`	number	<optional>	0	Pitch adjustment.
`ttsVolume`	number	<optional>	0	Volume adjustment in dB.
`avatarMute`	boolean	<optional>	false	If true, generate only animation/subtitles, no audio.
`avatarMood`	string	<optional>		Optional mood tag to attach to playback items.
`adapterConfig`	Object	<optional>		Extra configuration object passed directly to the TTS adapter.

onSubtitles

function

null

Callback invoked when subtitle clips are enqueued.

excludes

Array.<Array.<number>>

null

Array of [start, end] index ranges to skip during speech.

Returns:

Resolves when synthesis and enqueuing complete, or rejects if synthesis fails or the operation is aborted.

Type:: Promise.<void>

(async) speakWithVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio aligned to an explicit viseme timeline. Interprets viseme starts as START times; durations are explicit. If input.words is present and onSubtitles is provided, word tokens are also emitted as subtitle clips.

Parameters:

Name Type Attributes Default Description

input

Object

Properties

Name	Type	Attributes	Description
`visemes`	Object		Viseme timeline: { labels:string[], starts:number[], durations:number[], timeUnit?:string }.
`audio`	AudioBuffer \| ArrayBuffer \| Array.<ArrayBuffer>	<optional>	Optional audio to play.
`sampleRate`	number	<optional>	Sample rate when `audio` is raw PCM.
`markers`	Object	<optional>	Marker timeline: { labels:string[], times:number[], timeUnit?:string }.
`anim`	Object	<optional>	Full animation object to add.
`mode`	string	<optional>	Build mode: "auto" \| "audio" \| "anim".
`totalDurationMs`	number	<optional>	Duration cap used when no audio is present.

opts

Object

{}

Builder overrides.

onSubtitles

function

Subtitles callback.

Returns:

Type:: Promise.<void>

(async) speakWithWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}

Play audio aligned to word-level timings. Derives visemes from words via lipsync bridge. Can also run in animation-only mode when input.mode === "anim" or audio is absent. Word times are interpreted as START times; durations are explicit.

Parameters:

Name Type Attributes Default Description

input

Object

Properties

Name	Type	Attributes	Description
`words`	Object		Words timeline: { tokens:string[], starts:number[], durations:number[], timeUnit?:string }.
`audio`	AudioBuffer \| ArrayBuffer \| Array.<ArrayBuffer>	<optional>	Optional audio to play.
`sampleRate`	number	<optional>	Sample rate when `audio` is raw PCM.
`markers`	Object	<optional>	Marker timeline: { labels:string[], times:number[], timeUnit?:string }.
`anim`	Object	<optional>	Full animation object to add.
`mode`	string	<optional>	Build mode: "auto" \| "audio" \| "anim".
`totalDurationMs`	number	<optional>	Duration cap used when no audio is present.

opts

Object

{}

Builder overrides; supports { lipsyncLang?:string } among others.

onSubtitles

function

Subtitles callback; when provided, word tokens are also emitted as subtitle clips.

Returns:

Type:: Promise.<void>

stopSpeaking()

Stop speaking completely: stop current source, clear playlist and the full speechQueue. This is a hard reset for TTS/playback.

Type Definitions

TTSOptions

Options for the text-to-speech provider used by the avatar. These are merged over TTS_DEFAULTS and may be passed via TalkingAvatarOptions#ttsOpts.

Type:

Object

Properties

Name	Type	Attributes	Default	Description
`endpoint`	string	<optional> <nullable>	null	Base URL of the TTS service.
`apiKey`	string	<optional> <nullable>	null	API key used for the TTS service (if applicable).
`trim`	Object	<optional>	{start:0,end:400}	Client-side trim in milliseconds.
`lang`	string	<optional>	'en'	BCP-47 language tag for synthesis (e.g., 'en-US' or 'en').
`voice`	string	<optional> <nullable>	null	Voice model/name identifier.
`rate`	number	<optional>	1	Playback/synthesis rate multiplier.
`pitch`	number	<optional>	0	Pitch shift (service-dependent scale).
`volume`	number	<optional>	0	Output gain (service-dependent scale).
`jwtGet`	function	<optional>		Optional async getter for JWT auth token.

TalkingAvatarOptions

Global options for TalkingAvatar. These map 1:1 to this.opts with safe defaults applied in the constructor.

Type:

Object

Properties

Name	Type	Attributes	Default	Description
`characterUrl`	string \| Object			Absolute or relative URL to the avatar GLB/FBX model. Required.
`lipsyncLang`	string	<optional>	'en'	Primary lipsync language code used for TTS alignment and streaming.
`modelRoot`	string	<optional>	'Armature'	Name of the skeleton root in the avatar file.
`cameraView`	'full' \| 'head' \| 'torso'	<optional>	'full'	Initial camera framing preset.
`dracoEnabled`	boolean	<optional>	false	Enable Draco-compressed geometry decoding.
`dracoDecoderPath`	string	<optional>	'https://www.gstatic.com/draco/v1/decoders/'	Path to Draco decoders.
`customUpdate`	function	<optional>	null	Per-frame hook called after internal update; receives `dt` in ms.
`sceneManager`	SceneManager	<optional>	null	Optional external SceneManager instance to reuse an existing scene.
`sceneOpts`	SceneManager.SceneManagerOptions	<optional>	{}	Scene override options merged with SCENE_DEFAULTS.
`audio`	AudioManager.AudioManagerOptions	<optional>	{}	Audio override options merged with AUDIO_DEFAULTS.
`ttsOpts`	TalkingAvatar.TTSOptions	<optional>	{}	Text-to-speech provider options.

markerfn()

Callback when the speech queue processes this marker item.

progressfn(url, event)

Loading progress.

Parameters:

Name	Type	Description
`url`	string	URL of the resource being loaded.
`event`	Object	Progress event data.

subtitlesfn(node)

Callback when new subtitles have been written to the DOM node.

Parameters:

Name	Type	Description
`node`	HTMLElement	Target DOM node.