How to Query Parquet Files with SQL (No Python Needed)

· Parquet Explorer
parquetsqlduckdbtutorial

You have a .parquet file. You want to filter, aggregate, or join it — but you do not want to spin up a Python environment, install pandas, or configure a database. Good news: you can query Parquet files with plain SQL, right now, with zero setup. And with the right tool, you get a lot more than just a query editor.

This tutorial covers two approaches: Parquet Explorer (browser-based, full-featured) and DuckDB CLI (terminal-based, scriptable). Both use DuckDB under the hood and support full SQL syntax.

Option 1: Parquet Explorer (Browser)

Parquet Explorer is a complete Parquet platform that runs DuckDB-WASM directly in your browser. There is no server, no upload, and no account. Your data stays on your machine.

Getting Started

  1. Open parquetexplorer.com.
  2. Drag and drop your .parquet file onto the page, or click to browse.
  3. The file loads instantly. You will see the schema (with nested types displayed in a tree view), row count, and a data preview in a virtualized table that handles millions of rows smoothly.
  4. Switch to the SQL editor and start writing queries.

That is it. No installation, no dependencies, no configuration files.

Beyond Just Querying

What makes Parquet Explorer more than a query tool is the workflow it wraps around the SQL engine:

  • Query suggestions: Based on the schema and data in your loaded table, the editor offers contextual query suggestions — so you spend less time remembering column names and more time exploring.
  • Query history: Every query you run is saved in your session history, making it easy to revisit and refine earlier analyses.
  • Data profiler: Before you even write SQL, the profiler gives you per-column statistics, histograms, and semantic type detection. It automatically identifies columns containing emails, URLs, UUIDs, IP addresses, and phone numbers, and assigns a data quality score.
  • Schema and metadata inspector: Browse the full type tree (including STRUCT, LIST, and MAP nesting), row group details, compression codecs, and column-level stats like min/max values, null counts, and distinct counts.
  • Export results: Download query output as CSV, JSON, or Parquet (with Snappy compression).

Why This Works

Parquet Explorer embeds DuckDB-WASM — a full analytical database compiled to WebAssembly. When you drop a file, DuckDB reads it locally in your browser tab using the File System Access API. The query engine, optimizer, and storage layer all run client-side:

  • Files up to several gigabytes work fine (limited by your browser’s memory).
  • Query performance is surprisingly fast — DuckDB is one of the fastest analytical engines available.
  • Your data never touches a network. It is fully private.

Option 2: DuckDB CLI (Terminal)

If you prefer working in the terminal, the DuckDB CLI is a lightweight, single-binary tool that speaks SQL natively with Parquet files.

Installation

# macOS
brew install duckdb

# Linux (x86_64)
curl -LO https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip

# Windows — download from https://duckdb.org/docs/installation

Basic Usage

# Start an interactive session
duckdb

# Or query directly from the command line
duckdb -c "SELECT * FROM 'sales.parquet' LIMIT 10"

DuckDB treats Parquet files as tables automatically. Just reference the file path in your FROM clause.

Practical SQL Examples

The following examples work in both Parquet Explorer and DuckDB CLI. We will use a hypothetical orders.parquet file with columns: order_id, customer_id, product, category, quantity, price, order_date, and country.

Preview Data

SELECT * FROM orders LIMIT 20;

Always start here to understand your columns and data types. In Parquet Explorer, the virtualized results table lets you scroll through even massive result sets without lag.

Filter Rows

SELECT *
FROM orders
WHERE country = 'Germany'
  AND order_date >= '2025-01-01';

DuckDB pushes these predicates into the Parquet reader, skipping row groups that do not match — making filtered queries extremely fast even on large files.

Aggregate with GROUP BY

SELECT
    category,
    COUNT(*) AS num_orders,
    ROUND(SUM(price * quantity), 2) AS total_revenue,
    ROUND(AVG(price), 2) AS avg_price
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;

Date Functions

SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS orders,
    SUM(price * quantity) AS revenue
FROM orders
GROUP BY month
ORDER BY month;

DuckDB has extensive date/time functions: DATE_TRUNC, DATE_PART, DATE_DIFF, STRFTIME, and more.

Window Functions

SELECT
    order_id,
    customer_id,
    price,
    SUM(price) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS running_total
FROM orders;

Window functions work exactly as they do in PostgreSQL, making DuckDB feel familiar to anyone with SQL experience.

Top-N per Group

SELECT *
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY category
            ORDER BY price * quantity DESC
        ) AS rn
    FROM orders
)
WHERE rn <= 3;

This returns the three highest-revenue orders per category.

DESCRIBE and Schema Inspection

DESCRIBE SELECT * FROM orders;

This shows the column names and types. In Parquet Explorer, you also get this information visually in the schema panel — including nested type hierarchies rendered as an expandable tree.

Export Results

In DuckDB CLI, you can write query results to a new file:

COPY (
    SELECT * FROM orders WHERE country = 'Germany'
) TO 'german_orders.parquet' (FORMAT PARQUET);

In Parquet Explorer, you can export query results as CSV, JSON, or Parquet directly from the UI — no SQL COPY syntax needed.

Common Patterns and Tips

Use CTEs for Readability

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        category,
        SUM(price * quantity) AS revenue
    FROM orders
    GROUP BY month, category
)
SELECT
    month,
    category,
    revenue,
    LAG(revenue) OVER (PARTITION BY category ORDER BY month) AS prev_month,
    ROUND(100.0 * (revenue - LAG(revenue) OVER (PARTITION BY category ORDER BY month))
        / LAG(revenue) OVER (PARTITION BY category ORDER BY month), 1) AS pct_change
FROM monthly
ORDER BY category, month;

Sampling Large Files

SELECT * FROM orders USING SAMPLE 1%;

DuckDB supports TABLESAMPLE and USING SAMPLE for quick exploration of large datasets.

Check for Nulls

SELECT
    COUNT(*) AS total,
    COUNT(*) - COUNT(customer_id) AS null_customer_ids,
    COUNT(*) - COUNT(price) AS null_prices
FROM orders;

This is useful, but the data profiler in Parquet Explorer does this automatically for every column — along with histograms, distinct counts, and semantic type detection — without writing a single query.

Performance Tips

  1. Only select the columns you need. SELECT col1, col2 is faster than SELECT * because Parquet skips unneeded columns.
  2. Filter early. Push WHERE clauses as close to the base table as possible so DuckDB can use Parquet statistics to skip row groups.
  3. Use Parquet over CSV. If you have a choice, always query Parquet files. DuckDB can query CSV too, but Parquet is 10-50x faster on analytical queries. If your source data is CSV, TSV, JSON, or JSONL, you can convert it to Parquet in seconds using Parquet Explorer’s format converter.
  4. In the browser, watch memory. Parquet Explorer runs in your browser tab, so very large files (multi-GB) may need more memory. For desktop-scale exploration, it handles most files effortlessly.

When SQL Is Not Enough

For some tasks — machine learning, complex transformations, visualization — you will eventually want Python, R, or another language. But for data exploration, quality checks, quick aggregations, and ad-hoc analysis, SQL on Parquet files is hard to beat. It is fast, expressive, and requires no setup.

Start with parquetexplorer.com for instant, private access with built-in profiling, query suggestions, and format conversion. If you need scripting and automation, install the DuckDB CLI. Either way, you can go from file to insight in seconds — no Python needed.