Unified Chat Service

experiment
til
rag
ai
A global ‘chat with your docs’ service.
Published

February 14, 2026

Architectural sketch of a chat service on a notepad

What It Is

A unified chat service — a frontend component library plus central API for in-product AI chat. Consumer apps all plug into the same backend, which handles RAG via Neon Postgres + pgvector and chat completions via GCP Vertex AI.

External Services

Global Chat Service

Consumer Apps

RAG + vector search

completions + embeddings

Pomodoro

Unit Converter

Tip Calculator

POST /api/chat

Neon Postgres + pgvector

Vertex AI Gemini

High-level architecture

Consumers call POST /api/chat with a product_id; the service fetches relevant docs, streams responses, and auto-registers events.

Architecture

Services: Neon Postgres (pgvector) for vector search; GCP Vertex AI (text-embedding-005, Gemini 2.5 Flash) for embeddings and completions; NextAuth with GitHub for doc management at /library. Hosted on Vercel.

RAG flow: Consumer sends POST /api/chat with product_id, user, and messages. The service extracts search terms from the last message, embeds the query, runs cosine similarity search on document_chunks, builds a system prompt with the top 5 chunks, then streams the completion.

Data model: Projects (one per consumer app) → documents (uploaded via /library) → document_chunks (768-dim embeddings). CORS restricted to ALLOWED_ORIGINS.

Try the Apps

The consumer apps below demonstrate distinct styling and RAG tuned to each product’s documentation.

Chat Service Demo

Open in new window ↗

Unit Converter

Open in new window ↗

Minimal Pomodoro

Open in new window ↗

Tip Calculator

Open in new window ↗


Central service + RAG on Neon + Vertex AI. Consumers stay small; the heavy lifting happens once.