How I Read Parquet in Rust and Go Without an OOM

Outcome focus: Reader can pick the streaming Parquet read path in Rust and Go, configure the compression-codec features explicitly, and avoid the eager-load anti-patterns that look fine in benchmarks and break in production.

Part 1 of 3. Part 2: Reading Parquet from Elixir and Mojo Without Pretending the Runtime Is Native. Part 3: Why I Reach for DuckDB When Reading Parquet from Swift or Zig.

An ingest worker had a 4 GB memory limit on Cloud Run and a 1.4 GB Parquet file to read from GCS. The Go code looked fine. It deployed. Mid-shift, the container OOM-killed.

Two batches landed in BigQuery. Four did not. Someone reconciled the gap the next morning.

The Go code used github.com/parquet-go/parquet-go, the active fork that took over from github.com/segmentio/parquet-go in 2024. The reader was the generic, ergonomic one:

records, _ := parquet.ReadFile[Row]("data.parquet")

That call slurps the entire file into a slice. For a 1.4 GB file with about 80 columns, the in-memory representation pushed past 11 GB before the worker died. Parquet is a columnar format. The on-disk size does not predict the memory footprint after decompression and decoding.

The fix is the same shape in either language. Stop loading the file. Stream it.

The Go Fix#

parquet.NewGenericReader reads one row group at a time, decompresses it, yields rows into a supplied buffer, then drops the row group before loading the next. Resident memory stays bounded by row group size and decompression overhead, not by file size. Parquet row groups are typically 50 MB to 1 GB on disk, so the working set is bounded.

package main
 
import (
	"io"
	"log"
	"os"
 
	"github.com/parquet-go/parquet-go"
)
 
type Row struct {
	AccountID string  `parquet:"account_id"`
	Timestamp int64   `parquet:"ts"`
	Amount    float64 `parquet:"amount"`
}
 
func main() {
	f, err := os.Open("data.parquet")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
 
	reader := parquet.NewGenericReader[Row](f)
	defer reader.Close()
 
	buffer := make([]Row, 1024)
	for {
		n, err := reader.Read(buffer)
		if err != nil && err != io.EOF {
			log.Fatal(err)
		}
		if n == 0 {
			break
		}
		for i := 0; i < n; i++ {
			handle(buffer[i])
		}
	}
}
 
func handle(row Row) {
	// dispatch to the BigQuery loader, the message bus, or whatever is downstream
}

For the same 1.4 GB file under this pattern, resident memory stayed under 200 MB.

Two operational details that matter in practice. First, the struct tags only deserialize the fields the struct names. If the file has 80 columns and the struct has three, the reader skips the rest. That is implicit projection pushdown through the type system, and it is the cheapest performance win in Go Parquet code. Second, the buffer length is the chunk size for the Read call. Smaller buffers reduce peak allocation; larger buffers reduce per-batch overhead. 1024 is a fine default for short rows and dropping to 256 helps for wide rows with many strings.

The same 1.4 GB Parquet file under the eager and streaming reader patterns. The streaming reader bounds memory by row group size, not by file size.

The lesson from the OOM was not "do not use the library." The library is good. The lesson was that the library has two reading patterns, and the more obvious one is the wrong one for streaming. parquet.ReadFile[T] is a fine call when the file is small enough to live in memory and the workload needs all rows at once for sorting or cross-row analysis. For ingest pipelines and anything sized in gigabytes, NewGenericReader is the default that prevents the worker from being the place where Parquet meets the OOM killer.

The Rust Column-Native Path#

Rust does not have the same ergonomic-but-eager constructor at the top of its API, but it has a related anti-pattern in older tutorials. SerializedFileReader::get_row_iter iterates rows one at a time:

An older Rust pattern that still appears in copied code

// Anti-pattern for analytical workloads
use std::fs::File;
use parquet::file::reader::{FileReader, SerializedFileReader};
 
let file = File::open("data.parquet")?;
let reader = SerializedFileReader::new(file)?;
let iter = reader.get_row_iter(None)?;
for row in iter {
    let row = row?;
    // ...
}

This compiles and runs. It is also slow, because Parquet stores data column by column on disk and get_row_iter reassembles rows by reading values from every column for every record. For a file with 80 columns where the analyst cares about three, that is roughly an order of magnitude more I/O than the workload needs. Do not use this for analytical reads.

The column-native path is ParquetRecordBatchReaderBuilder. It reads directly into Arrow RecordBatch values, which preserve the columnar layout for downstream operators that compute on whole columns and can use SIMD instructions on contiguous memory.

use std::fs::File;
 
use arrow::record_batch::RecordBatch;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open("data.parquet")?;
    let builder = ParquetRecordBatchReaderBuilder::try_new(file)?;
 
    let schema = builder.schema().clone();
    println!("schema: {schema:?}");
 
    let mut reader = builder.with_batch_size(8192).build()?;
 
    let mut total_rows = 0;
    while let Some(batch) = reader.next() {
        let batch: RecordBatch = batch?;
        total_rows += batch.num_rows();
        // process batch.column(i) per column or hand the batch to a downstream operator
    }
 
    println!("rows: {total_rows}");
    Ok(())
}

with_batch_size is the memory dial. Smaller batches mean more iterations and lower peak memory. The default is 1024 rows per batch. The builder also exposes with_projection, which only decodes the columns you ask for; with_row_groups, which skips entire row groups by index when you already know the metadata; and with_row_filter, which evaluates a predicate during the decode rather than after. Each of those methods lets the reader skip work, and skipping work is the only reason Parquet is faster than CSV at this scale.

If the workload is async or the file lives in object storage, ParquetRecordBatchStreamBuilder is the same shape over a tokio-friendly async reader. The non-async builder is the right starting point for local files and synchronous pipelines.

The Cargo Features That Decide What Is Actually In Your Binary#

The parquet crate ships compression codecs as opt-in features. A default cargo add parquet does not include Snappy, which is the codec most production Parquet files use. The first error you will hit if the features do not match the file's codec is at decode time, not at link time, and the message names the missing codec, so a missing feature is fast to diagnose. The reason to think about features at all is binary size and what cleanly fails at compile time when a build environment is misconfigured.

Feature	Adds	Reason to enable
`arrow`	Arrow integration, the `RecordBatch` reader builder	Required for `ParquetRecordBatchReaderBuilder`.
`snap`	Snappy compression	Most common codec in production Parquet files.
`zstd`	Zstandard compression	Increasingly common; better ratio than Snappy.
`lz4`	LZ4 compression	Faster decompression than Snappy on some hardware.
`brotli`	Brotli compression	Rare in practice; only when you control the writer.
`async`	Async reader	Required for object-store reads on tokio.

A reasonable default for a service that reads production Parquet:

[dependencies]
parquet = { version = "58", features = ["arrow", "snap", "zstd"] }
arrow = "58"

I would add lz4 only when the workload's source files use it, and async when the reader composes with object_store or another async source. Verify the binary cost with cargo bloat --release if you care about cold-start size; the per-codec cost is in single-digit megabytes before stripping.

Polars When the Workload Is Analytical#

The low-level reader is the right tool for stream-and-process pipelines. For filter-aggregate-transform workloads, Polars sits on top of the same Arrow layer with a query optimizer that rewrites filters and projections into the Parquet read. The win is not the syntax. The win is that the optimizer pushes work into the reader so the reader can skip row groups and column pages.

use polars::prelude::*;
 
fn main() -> PolarsResult<()> {
    let lf = LazyFrame::scan_parquet("data.parquet", ScanArgsParquet::default())?;
 
    let plan = lf
        .filter(col("amount").gt(lit(1000)))
        .select([col("account_id"), col("amount")])
        .sort(["amount"], Default::default());
 
    println!("{}", plan.clone().explain(true)?);
    let df = plan.collect()?;
    println!("{df}");
    Ok(())
}

The interesting line is explain(true). The output shows the optimizer pushing the amount > 1000 predicate into the Parquet scan and pushing the column projection into the row-group reader. The eager equivalent (read everything, filter in memory) reads roughly one to two orders of magnitude more bytes on the same 80-column file. For a one-off query that takes 200 ms either way, this does not matter. For a job that runs every five minutes against a growing dataset, it matters.

The two questions that decide whether Polars or the low-level Arrow reader is the right entry point are about downstream shape and ownership. If the result of the read goes through DataFrame-style transforms, Polars is the better surface. If the result feeds an Arrow-native operator, custom kernel, or a system that already speaks RecordBatch, the low-level builder avoids the Polars round-trip.

The Object-Store Boundary#

A local Parquet file is the easy case. Production pipelines read Parquet from GCS, S3, Azure, or a custom blob store. The reader fetches the file footer first (the metadata, including row-group offsets), then issues range requests for the row groups it actually needs.

In Rust, the object_store crate provides the transport, and the parquet crate composes with it through the AsyncFileReader trait when the async feature is enabled. In Go, parquet.OpenFile takes a reader and a size, so the bytes can come from a local file, an *os.File, an HTTP range reader, or an S3 client wrapped to satisfy io.ReaderAt. The pattern to match in either language is the same: footer fetch first, page fetch on demand. The Apache Arrow Go ecosystem at github.com/apache/arrow-go provides a parallel set of tools when you want native Arrow RecordBatch interop in Go.

The mistake to avoid is forcing the file through a single Read(p []byte) call without range support. That path streams the entire file just to read the footer at the end, which on a multi-GB Parquet file is the same shape as the OOM that opened this post: download everything, then ask which 5 percent of it the query needs.

Decimal and Timestamp#

The most common silent-data-corruption bug I have seen in Parquet pipelines is on timestamps. The format has historically supported three timestamp encodings:

INT96 is the legacy Hive timestamp. Nanosecond precision, no timezone. Several writers still emit it for compatibility.
TIMESTAMP_MICROS is the standard modern encoding for microsecond timestamps, and it is what most modern writers default to.
TIMESTAMP_NANOS is the standard for nanosecond timestamps. Younger and less universally supported by older readers.

Arrow's reader maps these into Arrow Timestamp types, but the timezone handling is not always automatic. Files written by older Hive or Spark stacks with INT96 read into Arrow as Timestamp(Nanosecond, None), which is correct on the type axis but wrong on the semantics if downstream code assumes UTC and the writer's local time was not UTC. Decimal types have a similar story. Decimal128 covers up to 38 digits and rejects writers that produced more, which is the right behavior, but a defensive rounding wrapper at the boundary is sometimes the right shape.

The five minutes of work that pays for itself many times over is to look at the schema before you trust the data:

Rust: builder.schema() returns the Arrow schema. Print it.
Go: parquet.OpenFile(...).Schema().Fields() lists the fields with their types. Print it.

The number of bug reports that disappear once the team writes a tiny parquet-schema dump.parquet script is non-trivial.

Where the Rust Versus Go Choice Is Real#

The choice between Rust's column-native API and Go's struct-tag API is real on two axes.

The first is Arrow interop. If the pipeline downstream consumes Arrow buffers directly (DataFusion, an ADBC driver, or another Arrow-native engine), Rust gives you those buffers without copy. Go can produce them through arrow-go at the cost of an extra translation, and the more idiomatic parquet-go/parquet-go path is row-shaped first.

The second is concurrency model. Rust's async readers compose with tokio and the object_store crate; Go's reader is synchronous but goroutine-safe per instance, which fits Go's preferred concurrency model with one reader per worker.

If neither of those applies, the language choice is a team choice, not a Parquet choice. Both ecosystems have streaming readers that hold memory bounded by row group size. Both default to projection pushdown, one through the type system and one through the builder method. The OOM that opened this post was not a Go-versus-Rust failure. It was a streaming-versus-eager failure that any "make it ergonomic" library will let you stumble into if the wrong constructor sits at the top of the API.

Close#

If the Parquet reader sits inside a worker that has a memory limit, do not call the eager constructor. The streaming constructor is one extra line and saves a multi-hour reconciliation the first time the file gets bigger than the budget. Print the schema. Pin the compression features. Run explain(true) on the Polars query the first time, look at it, and make sure the optimizer is pushing the predicate down. Most Parquet pain in Rust and Go is preventable with five extra minutes per pipeline.