Datasets:

ai-spatial
/

GeoSR-Bench

Parquet error: Scan size limit exceeded: attempted to read 1284373405 bytes, limit is 300000000 bytes Make sure that 1. the Parquet files contain a page index to enable random access without loading entire row groups2. otherwise use smaller row-group sizes when serializing the Parquet files

Error code:   TooBigContentError

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

GeoSR-Bench

Dataset Description

GeoSR-Bench directly connects super-resolution (SR) with downstream Earth monitoring tasks, moving beyond conventional fidelity-based evaluation. It comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning spatial resolutions from 500 m to 0.6 m. It is designed to evaluate whether improved image resolution from SR models translates into better downstream performance for tasks such as land cover segmentation, infrastructure mapping, and biophysical variable estimation. GeoSR-Bench includes two cross-platform super-resolution tasks:

MODIS → Landsat-8
Sentinel-2 → NAIP

For each task, the dataset is organized into two types of subsets:

Super-resolution-only datasets
These subsets include paired lower-resolution and higher-resolution remote sensing images without downstream task labels. They are designed for training SR models.
Downstream task datasets
These subsets include paired lower-resolution and higher-resolution images together with task-specific labels. They are designed to finetune SR models and evaluate whether super-resolved images improve downstream Earth monitoring tasks, such as land cover segmentation, infrastructure mapping, and biophysical variable estimation.

Each sample may contain:

A lower-resolution image
A higher-resolution reference image
A downstream task label, when available
Metadata, when available

GeoSR-Bench is intended to support research on task-aware super-resolution, cross-platform learning, and remote sensing foundation models.

Folder Structure

The dataset contains both SR-only datasets and downstream task datasets.

SR-only dataset

SRDatasetName/
├── data/
│   ├── SRDatasetName_part_0000.parquet
│   ├── SRDatasetName_part_0001.parquet
│   └── ...
└── index.csv

Each Parquet file contains paired low-resolution and high-resolution images together with metadata:

id
lr_name
hr_name
lr_image
hr_image
meta

Downstream task dataset

DownstreamDatasetName/
├── train/
│   ├── DownstreamDatasetName_train_part_0000.parquet
│   ├── DownstreamDatasetName_train_part_0001.parquet
│   ├── ...
│   └── DownstreamDatasetName_train_index.csv
├── val/
│   ├── DownstreamDatasetName_val_part_0000.parquet
│   ├── DownstreamDatasetName_val_part_0001.parquet
│   ├── ...
│   └── DownstreamDatasetName_val_index.csv
├── test/
│   ├── DownstreamDatasetName_test_part_0000.parquet
│   ├── DownstreamDatasetName_test_part_0001.parquet
│   ├── ...
│   └── DownstreamDatasetName_test_index.csv
└── index.csv

Each Parquet file contains paired images, labels, and metadata:

id
lr_name
hr_name
label_name
meta_name
lr_image
hr_image
label_image
meta

Loading Images and Labels

Images and labels are stored as binary GeoTIFF bytes inside each Parquet file. They can be read directly with rasterio.

Load an SR-only sample

import io
import json
import pandas as pd
import rasterio

parquet_path = "Sentinel2_to_NAIP\SR_Dataset\data\sentinel2_to_naip_part_0000.parquet"

df = pd.read_parquet(parquet_path)
sample = df.iloc[0]

with rasterio.open(io.BytesIO(sample["lr_image"])) as src:
    lr = src.read()
    lr_profile = src.profile

with rasterio.open(io.BytesIO(sample["hr_image"])) as src:
    hr = src.read()
    hr_profile = src.profile

meta = json.loads(sample["meta"]) if sample["meta"] is not None else None

print(sample["id"])
print(lr.shape, hr.shape)
print(meta)

Load a downstream task sample

import io
import json
import pandas as pd
import rasterio

parquet_path = "Sentinel2_to_NAIP/Downstream_Datasets/RoadDetection/train/road_detection_train_part_0000.parquet"

df = pd.read_parquet(parquet_path)
sample = df.iloc[0]

with rasterio.open(io.BytesIO(sample["lr_image"])) as src:
    lr = src.read()
    lr_profile = src.profile

with rasterio.open(io.BytesIO(sample["hr_image"])) as src:
    hr = src.read()
    hr_profile = src.profile

with rasterio.open(io.BytesIO(sample["label_image"])) as src:
    label = src.read()
    label_profile = src.profile

meta = json.loads(sample["meta"]) if sample["meta"] is not None else None

print(sample["id"])
print(lr.shape, hr.shape, label.shape)
print(meta)

Convert a sample back to GeoTIFF files

from pathlib import Path
import pandas as pd

parquet_path = "Sentinel2_to_NAIP/Downstream_Datasets/RoadDetection/train/road_detection_train_part_0000.parquet"
out_dir = Path("recovered_sample")
out_dir.mkdir(parents=True, exist_ok=True)

df = pd.read_parquet(parquet_path)
sample = df.iloc[0]

(out_dir / sample["lr_name"]).write_bytes(sample["lr_image"])
(out_dir / sample["hr_name"]).write_bytes(sample["hr_image"])

if "label_image" in sample:
    (out_dir / sample["label_name"]).write_bytes(sample["label_image"])

Intended Use

This dataset is intended for research on:

Remote sensing image super-resolution
Downstream task-aware image restoration
Land cover mapping
Infrastructure mapping
Biophysical variable estimation
Cross-platform Earth observation learning
Geo-foundation models

Citation

If you use this dataset, please cite:

@article{li2026beyond,
  title={Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration},
  author={Li, Zhili and Chai, Kangyang and Wang, Zhihao and Jia, Xiaowei and Li, Yanhua and Mai, Gengchen and Skakun, Sergii and Manocha, Dinesh and Xie, Yiqun},
  journal={arXiv preprint arXiv:2605.00310},
  year={2026}
}

Downloads last month: 6,237

Paper for ai-spatial/GeoSR-Bench

Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

Paper • 2605.00310 • Published 30 days ago • 1