AI Web Scraper

Build an AI-powered web scraper: fetch → analyze → store.

This tutorial uses pgflow, a Postgres-native workflow engine that manages DAG dependencies, state transitions, and execution flow directly in your database. It works with Edge Worker, a lightweight runner that executes your workflow tasks, handles retries, and reports results back to pgflow - all running within Supabase Edge Functions. Together, they let you build reliable, observable workflows without extra infrastructure.

What you’ll build

You’ll create a practical AI web scraper workflow that:

Grabs content from any webpage with built-in error handling
Uses GPT-4o to generate summaries and extract relevant tags
Runs multiple AI operations in parallel (cutting processing time in half)
Stores everything neatly in your Postgres database
Auto-retries when things go wrong (because APIs sometimes fail)

Project Structure

Here’s the file structure we’ll create:

Directorysupabase/
- Directoryfunctions/
  - Directory_tasks/
    scrapeWebsite.ts
    summarize.ts
    extractTags.ts
    saveToDb.ts
  - Directory_flows/
    analyze_website.ts
  - Directoryanalyze_website_worker/
    index.ts

Prerequisites

Have a Supabase project initialized locally - see Supabase Local Development Guide
Terminal window
```
npx supabase init
```
Install pgflow - it automatically sets up Edge Worker environment variables
Terminal window
```
npx pgflow@latest install
```
Add your OpenAI API key to supabase/functions/.env:
supabase/functions/.env
```
EDGE_WORKER_DB_URL=postgres://...
EDGE_WORKER_LOG_LEVEL=info
OPENAI_API_KEY=sk-...
```
Important: Place this .env file in the supabase/functions directory, not in your project root. Edge Functions specifically look for environment variables in this location.

Versions Used in This Tutorial

This tutorial was tested with these specific tool versions:

Tool	Tested version
Supabase CLI	2.22.12
pgflow CLI	0.2.5
Deno	1.45.2

What you’ll learn

Write task functions to fetch and process web content
Generate structured data from AI using type-safe schemas
Create parallel DAG workflows with the TypeScript DSL
Compile flows to SQL and apply migrations
Execute workflows using the Edge Worker

Get started

Part 1: Build the Backend Create the database schema, AI tasks, and workflow

Join the Community

Connect on Discord

Have questions or need help? pgflow is just getting started - join us on Discord to ask questions, share feedback, or discuss partnership opportunities.