Datasets:
The dataset viewer is not available for this split.
Error code: TooBigContentError
Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
GeoSR-Bench
Dataset Description
GeoSR-Bench directly connects super-resolution (SR) with downstream Earth monitoring tasks, moving beyond conventional fidelity-based evaluation. It comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning spatial resolutions from 500 m to 0.6 m. It is designed to evaluate whether improved image resolution from SR models translates into better downstream performance for tasks such as land cover segmentation, infrastructure mapping, and biophysical variable estimation. GeoSR-Bench includes two cross-platform super-resolution tasks:
- MODIS β Landsat-8
- Sentinel-2 β NAIP
For each task, the dataset is organized into two types of subsets:
Super-resolution-only datasets
These subsets include paired lower-resolution and higher-resolution remote sensing images without downstream task labels. They are designed for training SR models.Downstream task datasets
These subsets include paired lower-resolution and higher-resolution images together with task-specific labels. They are designed to finetune SR models and evaluate whether super-resolved images improve downstream Earth monitoring tasks, such as land cover segmentation, infrastructure mapping, and biophysical variable estimation.
Each sample may contain:
- A lower-resolution image
- A higher-resolution reference image
- A downstream task label, when available
- Metadata, when available
GeoSR-Bench is intended to support research on task-aware super-resolution, cross-platform learning, and remote sensing foundation models.
Folder Structure
The dataset contains both SR-only datasets and downstream task datasets.
SR-only dataset
SRDatasetName/
βββ data/
β βββ SRDatasetName_part_0000.parquet
β βββ SRDatasetName_part_0001.parquet
β βββ ...
βββ index.csv
Each Parquet file contains paired low-resolution and high-resolution images together with metadata:
id
lr_name
hr_name
lr_image
hr_image
meta
Downstream task dataset
DownstreamDatasetName/
βββ train/
β βββ DownstreamDatasetName_train_part_0000.parquet
β βββ DownstreamDatasetName_train_part_0001.parquet
β βββ ...
β βββ DownstreamDatasetName_train_index.csv
βββ val/
β βββ DownstreamDatasetName_val_part_0000.parquet
β βββ DownstreamDatasetName_val_part_0001.parquet
β βββ ...
β βββ DownstreamDatasetName_val_index.csv
βββ test/
β βββ DownstreamDatasetName_test_part_0000.parquet
β βββ DownstreamDatasetName_test_part_0001.parquet
β βββ ...
β βββ DownstreamDatasetName_test_index.csv
βββ index.csv
Each Parquet file contains paired images, labels, and metadata:
id
lr_name
hr_name
label_name
meta_name
lr_image
hr_image
label_image
meta
Loading Images and Labels
Images and labels are stored as binary GeoTIFF bytes inside each Parquet file. They can be read directly with rasterio.
Load an SR-only sample
import io
import json
import pandas as pd
import rasterio
parquet_path = "Sentinel2_to_NAIP\SR_Dataset\data\sentinel2_to_naip_part_0000.parquet"
df = pd.read_parquet(parquet_path)
sample = df.iloc[0]
with rasterio.open(io.BytesIO(sample["lr_image"])) as src:
lr = src.read()
lr_profile = src.profile
with rasterio.open(io.BytesIO(sample["hr_image"])) as src:
hr = src.read()
hr_profile = src.profile
meta = json.loads(sample["meta"]) if sample["meta"] is not None else None
print(sample["id"])
print(lr.shape, hr.shape)
print(meta)
Load a downstream task sample
import io
import json
import pandas as pd
import rasterio
parquet_path = "Sentinel2_to_NAIP/Downstream_Datasets/RoadDetection/train/road_detection_train_part_0000.parquet"
df = pd.read_parquet(parquet_path)
sample = df.iloc[0]
with rasterio.open(io.BytesIO(sample["lr_image"])) as src:
lr = src.read()
lr_profile = src.profile
with rasterio.open(io.BytesIO(sample["hr_image"])) as src:
hr = src.read()
hr_profile = src.profile
with rasterio.open(io.BytesIO(sample["label_image"])) as src:
label = src.read()
label_profile = src.profile
meta = json.loads(sample["meta"]) if sample["meta"] is not None else None
print(sample["id"])
print(lr.shape, hr.shape, label.shape)
print(meta)
Convert a sample back to GeoTIFF files
from pathlib import Path
import pandas as pd
parquet_path = "Sentinel2_to_NAIP/Downstream_Datasets/RoadDetection/train/road_detection_train_part_0000.parquet"
out_dir = Path("recovered_sample")
out_dir.mkdir(parents=True, exist_ok=True)
df = pd.read_parquet(parquet_path)
sample = df.iloc[0]
(out_dir / sample["lr_name"]).write_bytes(sample["lr_image"])
(out_dir / sample["hr_name"]).write_bytes(sample["hr_image"])
if "label_image" in sample:
(out_dir / sample["label_name"]).write_bytes(sample["label_image"])
Intended Use
This dataset is intended for research on:
- Remote sensing image super-resolution
- Downstream task-aware image restoration
- Land cover mapping
- Infrastructure mapping
- Biophysical variable estimation
- Cross-platform Earth observation learning
- Geo-foundation models
Citation
If you use this dataset, please cite:
@article{li2026beyond,
title={Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration},
author={Li, Zhili and Chai, Kangyang and Wang, Zhihao and Jia, Xiaowei and Li, Yanhua and Mai, Gengchen and Skakun, Sergii and Manocha, Dinesh and Xie, Yiqun},
journal={arXiv preprint arXiv:2605.00310},
year={2026}
}
- Downloads last month
- 6,237