TUSHAR.
Article15 min read

SuiteTalk: Building a Multi-Tenant AI Voice Concierge for Hotels with Asterisk PBX and Gemini Live

A deep dive into SuiteTalk — a multi-tenant AI voice concierge for hotels that routes physical room phone calls via Asterisk PBX to Google Gemini's Multimodal Live API. Learn how we handle raw 8kHz-to-24kHz audio bridging, dynamic system prompt assembly, non-blocking tool calls, and real-time dashboard updates.

What is SuiteTalk?

SuiteTalk is a modern, on-premise and hybrid-SaaS AI voice assistant platform designed specifically for the hospitality industry. By bridging legacy analog/IP telephony infrastructure with state-of-the-art AI, it transforms how guests interact with hotel services.

Instead of navigating complex button menus or waiting in long telephone hold queues, a guest simply picks up their room's telephone, dials 0, and speaks naturally. An AI concierge answers instantly in the guest's language, ready to process in-room orders, schedule wake-up calls, book spa treatments, submit maintenance reports, or answer hotel FAQs.

All actions, request statuses, and conversation logs are pushed in real-time to a modern, multi-tenant staff command center, ensuring hotel personnel can immediately fulfill requests.

Core Capabilities of the Voice AI

Equipped with dynamic tool bindings, the voice concierge can perform several tasks autonomously:

  • Food & Dining Orders: Retrieve in-room dining menus and place or modify meal orders.
  • Housekeeping Services: Order room supplies (towels, toiletries, extra pillows) or request cleaning.
  • Maintenance Logging: Report technical faults or maintenance issues directly to engineering.
  • Wake-up Calls: Schedule personalized wake-up calls with custom notes.
  • Spa & Wellness Bookings: Book massage appointments, gym slots, or pool access.
  • Transportation Arrangements: Arrange airport transfers, taxi pickups, or shuttles.
  • Visitor Management: Authorize or deny visitor entry at reception on behalf of the guest.
  • Parcel Delivery Inquiries: Check for arrived packages and request delivery to the room.
  • Stay Extensions: Request to push check-out dates forward.
  • Lost & Found: File reports for lost items or check for recovered items.
  • Knowledge Base Search: Query policies, hours, or local recommendations via similarity search.

GitHub: github.com/tusharrayamajhi/SuiteTalk


The Problem SuiteTalk Solves

In the hospitality sector, room-service phones are a operational bottleneck:

  1. Queue Congestion: During peak check-out or dining hours, guests face frustrating delays waiting for front-desk operators.
  2. Operational Friction: Front-desk staff spend up to 40% of their day answering repetitive FAQs (e.g., "What is the WiFi password?", "When does the pool close?") instead of managing high-value check-ins or escalations.
  3. Integration Gaps: Legacy hotel PBX telephone systems (often using analog wiring or local SIP trunks) are entirely disconnected from modern SaaS applications and AI endpoints.
  4. Data Fragmentation: Guest requests are rarely logged uniformly. A food order might go to a kitchen slip, a towel request to a notebook, and a maintenance issue to a sticky note.

SuiteTalk bridges this gap. It acts as an intelligent VoIP-to-AI middleware that listens on analog telephone lines, parses speech in real-time using Google Gemini's Multimodal Live API, logs requests to a centralized database, and displays them on a unified staff dashboard.


High-Level Architecture

The SuiteTalk platform consists of three core components running in tenant-isolated environments.

The diagram below shows all three phases of data flow in real-time. Watch how audio streams loop through the telephony bridge (Phase 1), how tools execute against the local database (Phase 2), and how WebSockets push live events to the staff dashboard (Phase 3).

  1. Telephony Bridge (Asterisk PBX): Built on Asterisk, using chan_pjsip and the Stasis gateway application. Calls are mapped based on the extension (e.g. extension 1_101 represents Hotel 1, Room 101), establishing an AudioSocket connection to stream raw PCM audio.
  2. Backend Engine (Node.js + TypeScript): The intelligence layer. It controls Asterisk via AMI (Asterisk Manager Interface) and ARI (Asterisk REST Interface), manages the AudioSocket TCP server, downsamples/upsamples raw audio streams, communicates with the Gemini Live WebSocket API, and runs local tool handlers.
  3. Staff Dashboard (Next.js + Tailwind CSS): A multi-tenant administrative hub. Features live audio/text conversation transcripts, service request lists, room directories, and AI behavior configuration panels.

Under the Hood: Telephony & Audio Resampling

Bridging Asterisk and Google Gemini requires resolving a major disparity in audio encoding standards.

The Sample Rate Dilemma

  • Telephony Standard: Telecom systems run on 8kHz mono 16-bit PCM (G.711 or narrowband linear PCM). This sample rate is optimized for conserving bandwidth while keeping human speech intelligible.
  • Gemini Live API Standard: Google Gemini's Multimodal Live API requires 16kHz or 24kHz mono 16-bit PCM for input and streams back 24kHz PCM responses.

If we feed raw 8kHz telephony audio directly to Gemini, the voice sounds extremely slow and deep (pitch-shifted down). If we feed Gemini's 24kHz output directly to the telephone line, it sounds like chipmunks on fast-forward.

Real-Time PCM Resampling Algorithms

To fix this, we implemented high-speed, zero-latency resampling utilities directly in Node.js using buffer manipulation.

1. Telephony to AI: Upsampling (8kHz to 16kHz)

To raise the sample rate of the incoming guest audio to a level Gemini understands, we duplicate samples. Every single sample from the 8kHz stream is repeated once, doubling the sample count and outputting 16kHz audio:

typescript
/**
 * Upsample 8kHz mono 16-bit PCM to 16kHz mono 16-bit PCM by sample duplication.
 */
function upscale8to16(buffer: Buffer): Buffer {
  const samples = buffer.length / 2;
  const outBuffer = Buffer.alloc(buffer.length * 2);
  let outIndex = 0;
  
  for (let i = 0; i < samples; i++) {
    const val = buffer.readInt16LE(i * 2);
    outBuffer.writeInt16LE(val, outIndex);     // Original sample
    outBuffer.writeInt16LE(val, outIndex + 2); // Duplicated sample
    outIndex += 4;
  }
  return outBuffer;
}

2. AI to Telephony: Downsampling (24kHz to 8kHz)

When Gemini responds, it streams back high-fidelity 24kHz audio. To pipe this into the Asterisk AudioSocket, we apply decimation. We drop two out of every three samples (keeping 1 sample for every 3), which cleanly divides the sample rate by 3:

typescript
/**
 * Downsample 24kHz mono 16-bit PCM to 8kHz mono 16-bit PCM by decimation (factor of 3).
 */
function downsample24to8(buffer: Buffer): Buffer {
  const samples = buffer.length / 2;
  const outBuffer = Buffer.alloc(Math.floor(samples / 3) * 2);
  let outIndex = 0;
  
  for (let i = 0; i < samples; i += 3) {
    if (outIndex + 1 < outBuffer.length) {
      outBuffer.writeInt16LE(buffer.readInt16LE(i * 2), outIndex);
      outIndex += 2;
    }
  }
  return outBuffer;
}

Multi-Tenant AI Session Isolation

Because SuiteTalk runs as a SaaS, multiple hotels share the same backend. Each hotel has unique policies, room service menus, staff members, and branding.

When a call arrives, Asterisk sends the caller ID context through StasisStart. The backend parses the tenant details:

typescript
this.ariClient.on('StasisStart', async (event: any, channel: any) => {
  const callerNumber = channel.caller.number || '101'; // e.g. "1_101" (Hotel 1, Room 101)
  
  let hotelId = 1;
  let roomNumber = callerNumber;
  
  if (callerNumber.includes('_')) {
    const parts = callerNumber.split('_');
    hotelId = parseInt(parts[0]) || 1;
    roomNumber = parts[1];
  }
  
  // Initialize AI session with dynamic prompt & tools configuration
  await this.startAiSession(channel, roomNumber, hotelId);
});

Dynamic Prompt Assembly

Using the parsed hotelId, we query the PostgreSQL database via Prisma for that specific property's configurations (e.g., hotel name, brand voice, specific rules). We dynamically stitch together the systemInstruction before initiating the Gemini Live connection:

typescript
export async function buildSystemInstruction(hotelId: number = 1): Promise<string> {
  const settings = await prisma.setting.findMany({ where: { hotelId } });
  const s = settings.reduce((acc, curr) => ({ ...acc, [curr.key]: curr.value }), {} as any);
  
  const name = s.agent_name || "the SuiteTalk Concierge";
  const hotel = s.hotel_name || "the hotel";
  const tone = s.ai_tone || "warm, professional, and concise";
  
  let prompt = `You are ${name}, the AI voice concierge for ${hotel}, answering guests on their room phone. Speak in this tone: ${tone}.`;
  
  if (s.ai_instructions) {
    prompt += `\nHotel-specific guidance: ${s.ai_instructions}`;
  }
  
  prompt += `\n\n${BASE_RULES}`;
  return prompt;
}

Conversational Fluidity: Non-Blocking Function Calling

A major bottleneck in standard LLM agent workflows is blocking tool execution. In text-based chat, it's acceptable for the interface to show a loading spinner for 4 seconds while a database query runs. In a live telephone conversation, 4 seconds of dead silence feels like a broken call.

Gemini's Multimodal Live API provides a groundbreaking solution: NON_BLOCKING function declarations.

Designing the Conversation Flow

We configured every tool (e.g., dining_request, housekeeping_service, concierge_search) with behavior: "NON_BLOCKING". When Gemini decides to execute a tool:

  1. It immediately yields control back to the audio synthesis engine, continuing to speak to the guest.
  2. It outputs a text tool execution request to the Node.js backend.
  3. The guest hears the agent say something natural: "Certainly! I will get those towels sent to your room immediately."
  4. Meanwhile, the Node.js backend runs the database insert/update and communicates with the hotel PMS.
  5. When the tool completes, the backend pushes the results back into the active Gemini session. Gemini silently reads the tool response and seamlessly weaves it into the next turn without interruption.
typescript
export const liveTools = [{
  functionDeclarations: ALL_TOOL_DECLARATIONS.map((d: any) => ({ 
    ...d, 
    behavior: "NON_BLOCKING" // Non-blocking flag for live voice interaction
  }))
}];

This keeps the average call duration low, prevents awkward conversational pauses, and ensures the guest is never left wondering if the line went dead.


Semantic FAQ Search with pgvector

To answer questions like "What is the check-out policy?" or "Where is the pool located?", we implemented Retrieval-Augmented Generation (RAG) using PostgreSQL's pgvector extension.

  1. Knowledge Seeding: During tenant onboarding, hotel managers add FAQ entries through the dashboard. The backend sends these texts to Google's text embeddings API (text-embedding-004) and saves the resulting vector dimensions into the database.
  2. Retrieval Tool: When a guest asks a question, Gemini triggers concierge_search(query).
  3. Cosine Similarity Search: The backend embeds the guest query and runs a similarity search in PostgreSQL:
typescript
export async function getKnowledgeBase(query: string, hotelId: number) {
  const queryEmbedding = await generateEmbedding(query);
  
  // Vector search query using Prisma
  const matches: any[] = await prisma.$queryRaw`
    SELECT id, question, answer, 
           1 - (embedding <=> ${queryEmbedding}::vector) as similarity
    FROM "KnowledgeBase"
    WHERE "hotelId" = ${hotelId}
    AND 1 - (embedding <=> ${queryEmbedding}::vector) > 0.65
    ORDER BY similarity DESC
    LIMIT 3;
  `;
  
  return matches.map(m => m.answer).join("\n\n");
}

The matching answers are returned directly to Gemini, which synthesizes the answer back to the guest using high-fidelity natural speech.


Step-by-Step Setup Guide

Follow these steps to configure a local development environment for SuiteTalk.

Prerequisites

  • Node.js (v20 or newer)
  • PostgreSQL with the pgvector extension enabled
  • Asterisk PBX (v18+) configured with app_audiosocket.so and res_audiosocket.so

1. Database Configuration

Create a PostgreSQL database and add your credentials and Gemini API Key to backend/.env:

env
DATABASE_URL="postgresql://username:password@localhost:5432/suitetalk?schema=public"
GEMINI_API_KEY="AIzaSyYourGeminiApiKeyHere"

Initialize your database schema and run migrations:

bash
cd backend
npx prisma db push

Run the seeding scripts to load default hotel configurations, rooms, mock PMS items, and staff logins:

bash
npm run build
npx tsx src/stage_seeding.ts
npx tsx src/seed_new_hotel.ts

2. Launch the Backend Server

Install dependencies, compile the TypeScript source, and start the engine:

bash
cd backend
npm install
npm run build
npm run start

The server starts up and begins listening for:

  • REST API calls on port 3001
  • Asterisk AudioSocket TCP connections on port 9092

3. Launch the Next.js Dashboard

Navigate to the dashboard directory, install dependencies, and run the development server:

bash
cd ../dashboard
npm install
npm run dev

Open http://localhost:3000 in your browser.


Simulation & Live Dashboard Monitoring

You can test the SuiteTalk voice flow locally using either the built-in web simulator or a standard SIP softphone client.

Method 1: Built-in Web Simulator

  1. Open the Next.js Dashboard and log in using a pre-seeded tenant owner account:
    • Hotel 1 (The Grand Suite): owner@grandsuite.com (password: admin123)
    • Hotel 2 (Royal Palms Resort): owner@royalpalms.com (password: palms123)
  2. In the sidebar, navigate to the Guest Simulator page.
  3. Select a room (e.g., Room 101) and click Simulate Call.
  4. Allow microphone permissions in your browser. You can speak into your mic or enter text prompts.
  5. In another browser tab, view the Active Requests queue. You'll watch your voice commands instantly translate into database entries, updating the live staff UI.

Method 2: Testing with MicroSIP Softphone

To test it with a simulated telephone extension instead of the browser mic:

  1. Download and install MicroSIP (or any standard SIP softphone client).
  2. Configure a new account in MicroSIP with the following settings:
    • SIP Server: 127.0.0.1:5060 (or your Asterisk server's host IP address).
    • Username / Extension: Use a preconfigured room extension (e.g. 101 or 1_101).
    • Password: The PJSIP password defined in your local Asterisk configurations.
  3. Once registered, dial 0 in MicroSIP to initiate the call to the Stasis bridge.
  4. Speak into your microphone; the Node.js backend resampler will process the raw 8kHz stream, pipe it to Gemini at 16kHz, and return Gemini's synthesized 24kHz voice output downsampled back to your headset speaker.
  5. Open the staff dashboard and check the Active Requests queue to verify your voice commands are logged.

Future Enhancements

  • Legacy PMS Integrations: Direct plugins for Opera PMS and RoomKeyPMS to auto-post charges to guest folios.
  • SIP Trunk Failover: Automated SIP trunk redirection to cell networks if local Internet is lost.
  • Speaker Diarization: Distinguishing between different guests in a room (e.g. parent vs. child) to prevent unauthorized room charges.
Tushar Rayamajhi | AI Engineer & Backend Developer