Datasets:
id stringlengths 5 5 | image imagewidth (px) 1.92k 1.92k | category stringlengths 11 22 | question stringlengths 1.11k 1.53k | choices listlengths 4 4 | answer stringclasses 4
values |
|---|---|---|---|---|---|
q0002 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0006 | identify_rightmost | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <8>",
"B. <1>",
"C. <12>",
"D. <9>"
] | D | |
q0007 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0008 | order_leftmost | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <6>, <10>, <5>, <8>",
"B. <6>, <5>, <10>, <8>",
"C. <10>, <5>, <8>, <6>",
"D. <10>, <5>, <6>, <8>"
] | D | |
q0013 | pick_closer | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <13>",
"B. both are equidistant",
"C. <10>",
"D. cannot be determined"
] | C | |
q0014 | order_closest | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <1>, <5>, <13>, <3>",
"B. <3>, <13>, <5>, <1>",
"C. <1>, <3>, <5>, <13>",
"D. <1>, <13>, <5>, <3>"
] | C | |
q0015 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0017 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. forward — same direction as ego (12 o'clock)",
"B. leftward — perpendicular left (9 o'clock)",
"C. rightward — perpendicular right (3 o'clock)",
"D. backward — toward ego (6 o'clock)"
] | A | |
q0018 | pick_closer | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <2>",
"B. both are equidistant",
"C. <6>",
"D. cannot be determined"
] | A | |
q0019 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0020 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0021 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. forward — same direction as ego (12 o'clock)",
"B. rightward — perpendicular right (3 o'clock)",
"C. leftward — perpendicular left (9 o'clock)",
"D. backward — toward ego (6 o'clock)"
] | D | |
q0022 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0024 | identify_type | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. car",
"B. pedestrian",
"C. suv",
"D. light_truck"
] | C | |
q0025 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0028 | relative_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. cannot be determined",
"B. perpendicular to each other",
"C. roughly opposite directions",
"D. roughly the same direction"
] | C | |
q0029 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0031 | relative_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. roughly the same direction",
"B. perpendicular to each other",
"C. cannot be determined",
"D. roughly opposite directions"
] | D | |
q0033 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0034 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. backward — toward ego (6 o'clock)",
"B. leftward — perpendicular left (9 o'clock)",
"C. forward — same direction as ego (12 o'clock)",
"D. rightward — perpendicular right (3 o'clock)"
] | B | |
q0038 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. leftward — perpendicular left (9 o'clock)",
"B. forward — same direction as ego (12 o'clock)",
"C. rightward — perpendicular right (3 o'clock)",
"D. backward — toward ego (6 o'clock)"
] | A | |
q0040 | order_closest | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <22>, <11>, <8>, <9>",
"B. <11>, <8>, <9>, <22>",
"C. <11>, <9>, <22>, <8>",
"D. <8>, <9>, <11>, <22>"
] | D | |
q0044 | pick_closer | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. both are equidistant",
"B. <2>",
"C. cannot be determined",
"D. <3>"
] | B | |
q0045 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0048 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. leftward — perpendicular left (9 o'clock)",
"B. backward — toward ego (6 o'clock)",
"C. rightward — perpendicular right (3 o'clock)",
"D. forward — same direction as ego (12 o'clock)"
] | D | |
q0049 | relative_distance | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Very close (0-2m)",
"B. Close (2-10m)",
"C. Medium (10-30m)",
"D. Far (30m+)"
] | C | |
q0050 | relative_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. roughly opposite directions",
"B. roughly the same direction",
"C. cannot be determined",
"D. perpendicular to each other"
] | A | |
q0052 | embodied_collision | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Yes",
"B. No",
"C. cannot be determined",
"D. only if the object moves"
] | A | |
q0053 | identify_frontmost | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <12>",
"B. <6>",
"C. <14>",
"D. <8>"
] | C | |
q0055 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0056 | relative_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. perpendicular to each other",
"B. roughly the same direction",
"C. cannot be determined",
"D. roughly opposite directions"
] | D | |
q0060 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. forward — same direction as ego (12 o'clock)",
"B. backward — toward ego (6 o'clock)",
"C. leftward — perpendicular left (9 o'clock)",
"D. rightward — perpendicular right (3 o'clock)"
] | D | |
q0066 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0069 | order_leftmost | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <17>, <3>, <27>, <9>",
"B. <9>, <17>, <27>, <3>",
"C. <9>, <3>, <27>, <17>",
"D. <17>, <9>, <27>, <3>"
] | D | |
q0070 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0072 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. leftward — perpendicular left (9 o'clock)",
"B. forward — same direction as ego (12 o'clock)",
"C. backward — toward ego (6 o'clock)",
"D. rightward — perpendicular right (3 o'clock)"
] | B | |
q0073 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0075 | identify_closest | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <2>",
"B. <4>",
"C. <1>",
"D. <3>"
] | C | |
q0080 | relative_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. cannot be determined",
"B. perpendicular to each other",
"C. roughly the same direction",
"D. roughly opposite directions"
] | D | |
q0081 | order_closest | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <2>, <1>, <3>, <4>",
"B. <4>, <1>, <2>, <3>",
"C. <3>, <4>, <1>, <2>",
"D. <1>, <2>, <3>, <4>"
] | D | |
q0083 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | A | |
q0084 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. forward — same direction as ego (12 o'clock)",
"B. leftward — perpendicular left (9 o'clock)",
"C. backward — toward ego (6 o'clock)",
"D. rightward — perpendicular right (3 o'clock)"
] | D | |
q0086 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | D | |
q0087 | identify_frontmost | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. <1>",
"B. <4>",
"C. <3>",
"D. <6>"
] | D | |
q0088 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0094 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0095 | identify_position | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. nearby",
"B. ahead-left",
"C. behind-right",
"D. ahead"
] | B | |
q0096 | relative_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | C | |
q0098 | identify_distance_long | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. Close (0-20m)",
"B. Medium (20-50m)",
"C. Far (50-80m)",
"D. Very far (80m+)"
] | B | |
q0100 | identify_heading | You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ... | [
"A. backward — toward ego (6 o'clock)",
"B. rightward — perpendicular right (3 o'clock)",
"C. forward — same direction as ego (12 o'clock)",
"D. leftward — perpendicular left (9 o'clock)"
] | C |
Open Spatial Reasoning
A multiple-choice dataset of spatial reasoning questions and answers for evaluating 3D spatial reasoning from single driving images. Each image contains numbered bounding boxes referencing objects in the scene, and each question probes a model's ability to reconstruct the real 3D scene rather than rely on flat-image shortcuts (e.g. "lower in the frame = closer", "bigger box = nearer").
Dataset Description
Frontier vision-language models often answer these questions correctly by luck while reasoning incorrectly, leaning on pixel-layout heuristics that break down on elevated roads, slopes, curves, and intersections. This dataset is designed to surface that failure mode by requiring metric 3D reasoning about distance, lateral position, ordering, and heading.
Each sample pairs a driving-scene image with a question, four answer choices, and the correct answer letter.
The images were collected by autonomous vehicles operated by PlusAI.
Data Fields
| Field | Type | Description |
|---|---|---|
id |
string |
Unique question identifier (e.g. q0002) |
image |
image |
The driving image with numbered bounding boxes |
category |
string |
The reasoning task type (see categories below) |
question |
string |
The full question, including the reasoning protocol |
choices |
list[string] |
Four answer options, prefixed A.–D. |
answer |
string |
The correct answer letter (A, B, C, or D) |
Question Categories
The dataset spans several spatial-reasoning task types, including:
| Category | What it tests |
|---|---|
identify_distance_long |
Estimate the absolute distance to an object (binned 0–20m / 20–50m / 50–80m / 80m+) |
relative_distance_long |
Estimate the 3D separation between two objects |
pick_closer |
Decide which of two objects is closer to the ego vehicle |
identify_rightmost |
Identify the object furthest to the right in true 3D space |
order_leftmost |
Order several objects left-to-right in 3D space |
identify_position |
Classify an object's position relative to ego (e.g. ahead-left, behind-right) |
identify_heading |
Determine an object's heading using clock directions (12 = forward, 3 = right) |
Authors
Anurag Ganguli, Anshuman Lall, Abhishek Bhatia, Xiangyu Gao, Joe Yuan, Satish Vutukuru, Geoff Wolfe
Citation
If you use this dataset, please cite it:
@misc{driving_3d_spatial_reasoning,
title = {Open Spatial Reasoning},
author = {Anurag Ganguli, Anshuman Lall, Abhishek Bhatia, Xiangyu Gao, Joe Yuan, Satish Vutukuru, Geoff Wolfe},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/reasoncore/open-spatial-reasoning}}
}
License
Released under CC BY 4.0. Images were collected by autonomous vehicles operated by PlusAI.
- Downloads last month
- 30