Town Lake: Cloudflare’s Unified Data Analytics Platform and Skipper AI Data Agent
Build a unified data platform with Trino, R2, and Cloudflare Access to provide single SQL access across all data, with governance via Lifeguard and Skimmer.
Deploy Town Lake in your environment, configure Trino to connect to your data sources, set up Lifeguard policies, and run Skimmer to detect PII.
Summary
Cloudflare has launched Town Lake, a unified data analytics platform that provides a single SQL interface to all internal data sources, and Skipper, an AI data agent that lets users ask plain‑English questions. Town Lake is built as a data lakehouse using Apache Trino as the query engine, R2 as object storage with Iceberg for schema evolution, and DataHub for metadata cataloging. The platform includes Lifeguard, an access‑control service that stores policies in D1 and dynamically pulls user memberships, and Skimmer, a PII detection scanner that classifies columns and feeds findings into DataHub and Lifeguard’s allowlist. Transformer is the ELT engine that compiles YAML‑defined DAGs into Trino jobs, while Ingestion orchestrates data extraction from Postgres, ClickHouse, and other sources, transforms to Parquet, and loads into R2 as Iceberg tables. All components are built on Cloudflare’s own products—R2 for storage, Workers for compute, Access for authentication, and Workflows for orchestration—ensuring that the platform mirrors the services sold to customers.
Town Lake’s governance model is “default‑closed”: tables are inaccessible until Skimmer classifies them and a reviewer approves the specific columns. This approach automates PII detection and enforces a strict audit trail, making the platform suitable for sensitive data such as billing and security investigations. Skipper sits on top of Town Lake, allowing anyone with the right permissions to query data without writing SQL, thereby democratizing data access across the organization.
The launch of Town Lake and Skipper marks a significant step for Cloudflare’s internal data infrastructure, providing fresh, unsampled data for critical queries while maintaining fast, downsampled access for exploratory work.
Key changes
- Town Lake offers a single SQL interface across Postgres, ClickHouse, and Iceberg tables
- Trino pushes filters to ClickHouse and joins with Postgres and R2 in one query
- R2 Data Catalog stores cold/warm data as Iceberg tables with schema evolution
- DataHub catalogs all tables, columns, lineage, and glossary terms
- Lifeguard enforces access rules stored in D1 and renders policies to Trino
- Skimmer continuously scans for PII and feeds findings to DataHub and Lifeguard
- Transformer compiles YAML‑defined DAGs into Trino jobs for ELT
- Ingestion orchestrates extraction, transformation to Parquet, and loading into R2