Constructor
new TalkingAvatar(node, optsopt)
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
node | HTMLElement | Container element for the renderer. Usually resolved via parent SDK object. | ||
opts | TalkingAvatar. | <optional> | null | Avatar configuration (scene, audio, TTS, etc.). |
| Name | Type | Description |
|---|---|---|
node | HTMLElement | Container DOM node for the avatar renderer. |
opts | TalkingAvatar. | Current avatar configuration options. |
audio | AudioManager | Instance of AudioManager used by the avatar. |
sceneManager | SceneManager | Instance of SceneManager used by the avatar. |
ttsOpts | TalkingAvatar. | Text-to-speech options used by the avatar. |
characterUrl | Object | character url object. |
const avatar = new AvaCapoSDK.TalkingAvatar(null, {
characterUrl: 'link_to_the_model',
lipsyncLang: 'en',
cameraView: 'full',
};Members
(static, constant) exports.TTS_DEFAULTS :TalkingAvatar.TTSOptions
Default values used to initialize TalkingAvatar.TTSOptions.
Methods
(async) animateVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}
Animate visemes without audio (forces animation-only mode).
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
input | Object | Same shape as in | ||
opts | Object | <optional> | {} | Builder overrides. |
onSubtitles | function | <optional> | null | Subtitles callback. |
- Type:
- Promise.<void>
(async) animateWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}
Animate from words without audio (forces animation-only mode). Derives visemes from words via lipsync bridge and plays animation only.
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
input | Object | Same shape as in | ||
opts | Object | <optional> | {} | Builder overrides; supports { lipsyncLang?:string } among others. |
onSubtitles | function | <optional> | null | Subtitles callback. |
- Type:
- Promise.<void>
(async) dispose() → {Promise.<void>}
Dispose the avatar and free resources.
- Type:
- Promise.<void>
getCameraView() → {string}
Get current camera view. One of 'full', 'mid', 'torso', 'head'.
View name.
- Type:
- string
getCameraViewNames() → {Array.<string>}
Get avaliable camera view names: 'full', 'mid', 'torso', 'head'.
Supported view names.
- Type:
- Array.<string>
(async) loadAsync(onProgressopt) → {Promise.<void>}
Loader for 3D avatar model.
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
onProgress | progressfn | <optional> | null | Callback for progress |
- Type:
- Promise.<void>
(async) say(r, optsopt, onSubtitlesopt) → {Promise.<void>}
Smart router that chooses an appropriate playback path based on provided fields:
- audio + visemes → speakWithVisemes
- audio + words → speakWithWords
- audio only → speakAudio
- words only → animateWords
- visemes only → animateVisemes
- text/ssml → speakText (external TTS path)
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
r | Object | Mixed input; see dedicated methods for exact shapes. | ||
opts | Object | <optional> | {} | Optional overrides (e.g., lipsyncLang, trim options). |
onSubtitles | function | <optional> | null | Subtitles callback. |
- Type:
- Promise.<void>
setCameraView(view, opts)
Set camera view preset for the avatar. This will position the camera to frame the avatar based on the chosen preset.
| Name | Type | Default | Description |
|---|---|---|---|
view | string | one of 'full', 'mid', 'torso', 'head'. | |
opts | SceneManager. | null | optional camera overrides. |
setLipsyncLanguage(lang)
Set lypsink language for the avatar
| Name | Type | Description |
|---|---|---|
lang | string | Language for lypsink engine |
(async) speakAudio(input, optsopt, onSubtitlesopt) → {Promise.<void>}
Play audio with optional timeline-only animation/markers (no viseme derivation). Suitable for audio-only or audio + markers cases.
| Name | Type | Attributes | Default | Description | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | Object | Properties
| ||||||||||||||||||||||||||||||
opts | Object | <optional> | {} | Builder overrides (trimStartMs, trimEndMs, etc.). | ||||||||||||||||||||||||||||
onSubtitles | function | <optional> | Callback invoked by the renderer for subtitle updates. |
- Type:
- Promise.<void>
(async) speakEmoji(em)
Add emoji to speech queue.
| Name | Type | Description |
|---|---|---|
em | string | Emoji. |
(async) speakPause(t)
Add a pause to the speech queue.
| Name | Type | Description |
|---|---|---|
t | number | Duration in milliseconds. |
(async) speakText(s, optsopt, onSubtitlesopt, excludesopt) → {Promise.<void>}
Speak text using the adapter-driven TTS pipeline. Behavior:
- The input text is tokenized into sentences, words, emojis, and breaks.
- Non-speech chunks (breaks, emojis) are enqueued as-is.
- Speech chunks are passed to the active TTS adapter for synthesis.
- Word timings and visemes are taken from the adapter when available; otherwise they are derived automatically from the synthesized audio.
- Supports cancellation via
AbortController: callingabortActive()will stop the current synthesis request. - Playback begins automatically once items are enqueued.
| Name | Type | Attributes | Default | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s | string | Text to synthesize. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
opts | Object | <optional> | null | Synthesis options (same shape as Properties
| ||||||||||||||||||||||||||||||||||||||||||||||||||
onSubtitles | function | <optional> | null | Callback invoked when subtitle clips are enqueued. | ||||||||||||||||||||||||||||||||||||||||||||||||||
excludes | Array.<Array.<number>> | <optional> | null | Array of |
Resolves when synthesis and enqueuing complete, or rejects if synthesis fails or the operation is aborted.
- Type:
- Promise.<void>
(async) speakWithVisemes(input, optsopt, onSubtitlesopt) → {Promise.<void>}
Play audio aligned to an explicit viseme timeline. Interprets viseme starts as START times; durations are explicit. If input.words is present and onSubtitles is provided, word tokens are also emitted as subtitle clips.
| Name | Type | Attributes | Default | Description | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | Object | Properties
| ||||||||||||||||||||||||||||||||||
opts | Object | <optional> | {} | Builder overrides. | ||||||||||||||||||||||||||||||||
onSubtitles | function | <optional> | Subtitles callback. |
- Type:
- Promise.<void>
(async) speakWithWords(input, optsopt, onSubtitlesopt) → {Promise.<void>}
Play audio aligned to word-level timings. Derives visemes from words via lipsync bridge. Can also run in animation-only mode when input.mode === "anim" or audio is absent. Word times are interpreted as START times; durations are explicit.
| Name | Type | Attributes | Default | Description | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input | Object | Properties
| ||||||||||||||||||||||||||||||||||
opts | Object | <optional> | {} | Builder overrides; supports { lipsyncLang?:string } among others. | ||||||||||||||||||||||||||||||||
onSubtitles | function | <optional> | Subtitles callback; when provided, word tokens are also emitted as subtitle clips. |
- Type:
- Promise.<void>
stopSpeaking()
Stop speaking completely: stop current source, clear playlist and the full speechQueue. This is a hard reset for TTS/playback.
Type Definitions
TTSOptions
Options for the text-to-speech provider used by the avatar. These are merged over TTS_DEFAULTS and may be passed via TalkingAvatarOptions#ttsOpts.
- Object
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
endpoint | string | <optional> <nullable> | null | Base URL of the TTS service. |
apiKey | string | <optional> <nullable> | null | API key used for the TTS service (if applicable). |
trim | Object | <optional> | {start:0,end:400} | Client-side trim in milliseconds. |
lang | string | <optional> | 'en' | BCP-47 language tag for synthesis (e.g., 'en-US' or 'en'). |
voice | string | <optional> <nullable> | null | Voice model/name identifier. |
rate | number | <optional> | 1 | Playback/synthesis rate multiplier. |
pitch | number | <optional> | 0 | Pitch shift (service-dependent scale). |
volume | number | <optional> | 0 | Output gain (service-dependent scale). |
jwtGet | function | <optional> | Optional async getter for JWT auth token. |
TalkingAvatarOptions
Global options for TalkingAvatar. These map 1:1 to this.opts with safe defaults applied in the constructor.
- Object
| Name | Type | Attributes | Default | Description |
|---|---|---|---|---|
characterUrl | string | | Absolute or relative URL to the avatar GLB/FBX model. Required. | ||
lipsyncLang | string | <optional> | 'en' | Primary lipsync language code used for TTS alignment and streaming. |
modelRoot | string | <optional> | 'Armature' | Name of the skeleton root in the avatar file. |
cameraView | 'full' | | <optional> | 'full' | Initial camera framing preset. |
dracoEnabled | boolean | <optional> | false | Enable Draco-compressed geometry decoding. |
dracoDecoderPath | string | <optional> | 'https://www.gstatic.com/draco/v1/decoders/' | Path to Draco decoders. |
customUpdate | function | <optional> | null | Per-frame hook called after internal update; receives |
sceneManager | SceneManager | <optional> | null | Optional external SceneManager instance to reuse an existing scene. |
sceneOpts | SceneManager. | <optional> | {} | Scene override options merged with SCENE_DEFAULTS. |
audio | AudioManager. | <optional> | {} | Audio override options merged with AUDIO_DEFAULTS. |
ttsOpts | TalkingAvatar. | <optional> | {} | Text-to-speech provider options. |
markerfn()
Callback when the speech queue processes this marker item.
progressfn(url, event)
Loading progress.
| Name | Type | Description |
|---|---|---|
url | string | URL of the resource being loaded. |
event | Object | Progress event data. |
subtitlesfn(node)
Callback when new subtitles have been written to the DOM node.
| Name | Type | Description |
|---|---|---|
node | HTMLElement | Target DOM node. |