UK AI News Crawler
til
web-scraping
ai
rag
chat
An automated UK AI news aggregator with RAG chat

What It Is
A side project that crawls, summarises and classifies UK-focused AI/ML news articles, and lets you ask questions about them via a RAG chat interface.
What It Does
- Weekly crawl: A GitHub Actions cron job fires every Monday, searching DuckDuckGo for UK AI keywords and scraping the results.
- AI summarisation: Each article is summarised and sentiment-classified using Vertex AI (Gemini 2.5 Flash).
- Vector search: Articles are embedded with
text-embedding-005(768 dims) and stored in Neon PostgreSQL with pgvector. - RAG chat: Ask a question, the top 5 most relevant articles are retrieved and Gemini generates an answer grounded in the sources.
- Admin via GitHub OAuth: The repo owner can delete articles and trigger reclassification from the UI.
Live Demo
Architecture at a Glance
Tech Stack
| Layer | Choice |
|---|---|
| Frontend | Next.js 14, React 18, Tailwind, shadcn/ui |
| Backend | FastAPI (Python), Mangum ASGI adapter |
| Database | Neon PostgreSQL + pgvector |
| AI | Vertex AI: Gemini 2.5 Flash, text-embedding-005 |
| Auth | NextAuth.js + GitHub OAuth |
| Hosting | Vercel (two projects: frontend & backend) |
| CI/CD | GitHub Actions (weekly crawl) + Vercel auto-deploy |