@enfinitos/sdk-audio
EnfinitOS reference SDK for the AUDIO substrate — smart-speaker
- podcast + audio-streaming. Built around the **2026 voice-
assistant LLM-extension pattern**: voice platforms increasingly treat third-party content as plug-in extensions to LLM-driven assistants (Alexa+ uses Claude; Google Assistant with Gemini; Apple Intelligence with the on-device LLM).
Architecture
┌─────────────────────────────────────────┐
│ @enfinitos/sdk-renderer-core (TS) │
│ resolve / event-ingest / health │
│ AudioAttentionConstraint │
└─────────────────────────────────────────┘
▲
│
┌──────────────────┴──────────────────┐
│ audio ts-core (AttentionTracker + │
│ AudioClient + listening-session │
│ resolver) │
└──────────────────┬──────────────────┘
│
┌────────────────────┬────────────┴────────────┬─────────────────┐
│ │ │ │
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Smart │ │ Podcast │ │ Streaming│ │ App │
│ Speaker │ │ partners │ │ partners │ │ Intents │
│ (Alexa+/ │ │(Megaph., │ │(Spotify, │ │ (Apple) │
│ Google/ │ │ Acast, │ │ Pandora) │ │ Siri+LLM │
│ Apple AI)│ │ Spreaker)│ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Getting started
TypeScript core
import { AudioClient } from "@enfinitos/sdk-audio";
const audio = new AudioClient({
apiBaseUrl: "https://api.enfinitos.com",
authToken: jwt,
deviceId: speakerHashedId,
});
await audio.start();
// Begin a listening session — required by the
// AudioAttentionConstraint contract.
const session = audio.beginListeningSession({
attentionMode: "active-listening",
channelKind: "podcast-pre-roll",
});
const asset = await audio.resolveNext({ surfaceId: "pre-roll" });
if (asset) {
await audio.beginPlayback(asset);
// After playback the wrapper calls reportPlayEnded with dwell.
}
audio.endListeningSession(session.id);
Alexa+ (voice-assistant LLM extension)
import { handler } from "@enfinitos/sdk-audio/smart-speaker/alexa-audio";
export const lambdaHandler = handler({
apiBaseUrl: "https://api.enfinitos.com",
authToken: process.env.ENFINITOS_TOKEN!,
alexaSkillId: process.env.ALEXA_SKILL_ID!,
llmExtension: {
mode: "alexa-plus",
claudeModel: "claude-3-5-sonnet-20240620",
},
});
Apple App Intents (replacement for SiriKit)
import EnfinitOSAppIntents
struct PlayEnfinitOSPodcast: AppIntent {
static var title: LocalizedStringResource = "Play EnfinitOS Podcast"
func perform() async throws -> some IntentResult {
let client = EnfinitOSAppIntentsClient.shared
let asset = try await client.resolveNext(surfaceId: "podcast-pre-roll")
try await client.beginPlayback(asset)
return .result()
}
}
Alexa+ (LLM-extension pattern)
- Alexa+ is Amazon's LLM-powered Alexa, GA-ed in early 2025 with Claude (Anthropic) as the primary reasoning model. The classic-Alexa template-based skill surface still works; the new path is an LLM extension: the operator registers a set of capabilities + a system-prompt-style description and Alexa+ decides when to invoke them.
- ASK SDK v3 (Node.js, Python) is the supported SDK for the classic flow. The LLM-extension path uses the Alexa Skill API REST endpoints + the Alexa+ "Skill-as-Action" registration.
- AudioPlayer interface is unchanged across the LLM and classic-template paths; the SDK targets
AudioPlayer.Play, AudioPlayer.Stop, AudioPlayer.PlaybackStarted/Finished. - Permissions card flow for user data (location, address) routes through the same OAuth-linked context.
Google Assistant with Gemini (Conversational Actions / Actions Builder)
- Actions on Google v3 SDK is end-of-life as of mid-2024. We do NOT use it.
- New surface: Conversational Actions via Dialogflow CX or Actions Builder API directly (
@google/actions-style REST). Google Assistant invokes the action through Gemini's function-calling surface. - Audio playback uses the
Media response object inside the conversational webhook. No WebSocket; it's a turn-based REST pattern.
Apple App Intents (replacement for SiriKit)
- App Intents framework (iOS 16+, hardened in iOS 17) replaces SiriKit Intents Extension which is deprecated for new code. We do NOT use
INExtension / INIntentHandler. - The SDK uses the
AppIntent protocol with Swift Concurrency (async throws). - Apple Intelligence (announced 2024, GA 2026) is the on-device LLM; App Intents are the integration surface for Apple Intelligence's "act on apps" capability.
Podcast partners
| Partner | API | Notes |
|---|
| Megaphone | Spotify-owned podcast platform (formerly Panoply Media). REST API documented; ad-insertion via DAI. |
| Acast | Open podcast platform. REST + RSS hybrid. Their Open API V1.0 is current. |
| Spreaker | iHeartMedia-owned. REST API; supports server-side ad insertion via DAI. |
Audio-streaming partners
| Partner | Surface | Notes |
|---|
| Spotify Audio Ads SDK | Spotify's first-party ads SDK for connected speakers + ad-supported tier. Authenticated via Spotify Partner credentials. |
| Pandora | SiriusXM-owned. Audio + display ads via Pandora's Ad-Tech Inventory API. |
Endpoint surface
| SDK call | Platform endpoint | Status |
|---|
beginListeningSession | POST /audio-attention/session | existing |
resolveNext | POST /runtime/resolve | existing |
reportPlayStarted/Ended/Click | POST /runtime/event-ingest | existing |
| Alexa+ LLM-extension register | Alexa Skill API REST | needs future API work |
| Megaphone / Acast / Spreaker | Each partner's REST/DAI endpoint | existing (partner-side) |
| Spotify Audio Ads | Spotify's first-party Ads endpoint | existing (partner-side) |
Test plan
- TS core + smart-speaker / podcast / streaming bridges: Vitest.
- Apple App Intents: XCTest.