UK AI News Crawler

til
web-scraping
ai
rag
chat
An automated UK AI news aggregator with RAG chat
Published

March 7, 2026

A friendly robot reading an AI newspaper at a cozy desk

What It Is

A side project that crawls, summarises and classifies UK-focused AI/ML news articles, and lets you ask questions about them via a RAG chat interface.

What It Does

  • Weekly crawl: A GitHub Actions cron job fires every Monday, searching DuckDuckGo for UK AI keywords and scraping the results.
  • AI summarisation: Each article is summarised and sentiment-classified using Vertex AI (Gemini 2.5 Flash).
  • Vector search: Articles are embedded with text-embedding-005 (768 dims) and stored in Neon PostgreSQL with pgvector.
  • RAG chat: Ask a question, the top 5 most relevant articles are retrieved and Gemini generates an answer grounded in the sources.
  • Admin via GitHub OAuth: The repo owner can delete articles and trigger reclassification from the UI.

Live Demo

Open in new window ↗

Architecture at a Glance

GitHub Actions (weekly cron)

DuckDuckGo Search

Scrape & Filter

Vertex AI Summarise & Embed

Neon Postgres + pgvector

User Question

Vector Similarity Search

Gemini RAG Answer

Streamed Response

Article Listings

High-level data flow

Tech Stack

Layer Choice
Frontend Next.js 14, React 18, Tailwind, shadcn/ui
Backend FastAPI (Python), Mangum ASGI adapter
Database Neon PostgreSQL + pgvector
AI Vertex AI: Gemini 2.5 Flash, text-embedding-005
Auth NextAuth.js + GitHub OAuth
Hosting Vercel (two projects: frontend & backend)
CI/CD GitHub Actions (weekly crawl) + Vercel auto-deploy