Skip to content

AI Web Scraper

Build an AI-powered web scraper: fetch → analyze → store.

Description of animation

This tutorial uses pgflow, a Postgres-native workflow engine that manages DAG dependencies, state transitions, and execution flow directly in your database. It works with Edge Worker, a lightweight runner that executes your workflow tasks, handles retries, and reports results back to pgflow - all running within Supabase Edge Functions. Together, they let you build reliable, observable workflows without extra infrastructure.

You’ll create a practical AI web scraper workflow that:

  1. Grabs content from any webpage with built-in error handling
  2. Uses GPT-4o to generate summaries and extract relevant tags
  3. Runs multiple AI operations in parallel (cutting processing time in half)
  4. Stores everything neatly in your Postgres database
  5. Auto-retries when things go wrong (because APIs sometimes fail)
AI Web Scraper Workflow

Here’s the file structure we’ll create:

  • Directorysupabase/
    • Directoryfunctions/
      • Directory_tasks/
        • scrapeWebsite.ts
        • summarize.ts
        • extractTags.ts
        • saveToDb.ts
      • Directory_flows/
        • analyze_website.ts
      • Directoryanalyze_website_worker/
        • index.ts

This tutorial was tested with these specific tool versions:

ToolTested version
Supabase CLI2.22.12
pgflow CLI0.2.5
Deno1.45.2
  1. Write task functions to fetch and process web content
  2. Generate structured data from AI using type-safe schemas
  3. Create parallel DAG workflows with the TypeScript DSL
  4. Compile flows to SQL and apply migrations
  5. Execute workflows using the Edge Worker

Join the Community

Connect on Discord

Have questions or need help? pgflow is just getting started - join us on Discord to ask questions, share feedback, or discuss partnership opportunities.
Join Discord →