Symbolic Capital

Media Aggregator Pipeline

n8n yt-dlp Whisper

n8n workflow aggregating content from Reddit, RSS feeds, and TikTok sources for automated newsletter generation with AI-powered summarization.

The Problem

Staying current across multiple platforms is time-consuming. Reddit threads, TikTok videos, RSS feeds, newsletters—they all require manual monitoring, curation, and synthesis.

Commercial aggregators exist, but they're either too expensive, too limited, or don't support the specific combination of sources I needed.

The Solution

A fully automated pipeline that:

  • Monitors specified Reddit subreddits for high-quality discussions
  • Pulls content from RSS feeds and checks for keyword matches
  • Downloads TikTok videos using yt-dlp
  • Transcribes video content with Whisper
  • Summarizes everything using AI models
  • Compiles a weekly digest newsletter

Architecture

Built entirely in n8n, the workflow runs on a self-hosted VPS. Each source type has its own sub-workflow:

  • Reddit Monitor - Uses Reddit's API to fetch posts, filters by score and keywords, extracts top comments
  • RSS Aggregator - Polls feeds on schedule, deduplicates entries, stores in SQLite
  • Video Processor - Downloads videos with yt-dlp, transcribes audio with Whisper, extracts key quotes
  • AI Summarizer - Sends content to local LLM for summarization, maintains consistent formatting
  • Newsletter Builder - Compiles summaries into HTML email, sends via SMTP

Technical Details

The most complex part was video processing. yt-dlp downloads videos, ffmpeg extracts audio, and Whisper handles transcription. All of this runs in Docker containers orchestrated by n8n.

For summarization, I'm using a local Llama model running on the same VPS. This keeps costs near zero and avoids third-party API dependencies.

Cost Breakdown

  • VPS hosting: $12/month
  • Domain & email: $3/month
  • Total API costs: $0 (all self-hosted)

The equivalent functionality using services like Zapier + OpenAI + external transcription would cost $200+/month.

Results

The pipeline has been running reliably for 6+ months. Weekly newsletters include 20-30 curated items with AI-generated summaries. Processing time per newsletter: about 2 hours (fully automated).

The key insight: treat content aggregation as infrastructure, not a service. Build once, customize freely, run forever.

← Back to Projects