Dataset Viewer
Auto-converted to Parquet Duplicate
id
stringlengths
5
5
image
imagewidth (px)
1.92k
1.92k
category
stringlengths
11
22
question
stringlengths
1.11k
1.53k
choices
listlengths
4
4
answer
stringclasses
4 values
q0002
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0006
identify_rightmost
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <8>", "B. <1>", "C. <12>", "D. <9>" ]
D
q0007
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0008
order_leftmost
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <6>, <10>, <5>, <8>", "B. <6>, <5>, <10>, <8>", "C. <10>, <5>, <8>, <6>", "D. <10>, <5>, <6>, <8>" ]
D
q0013
pick_closer
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <13>", "B. both are equidistant", "C. <10>", "D. cannot be determined" ]
C
q0014
order_closest
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <1>, <5>, <13>, <3>", "B. <3>, <13>, <5>, <1>", "C. <1>, <3>, <5>, <13>", "D. <1>, <13>, <5>, <3>" ]
C
q0015
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0017
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. forward — same direction as ego (12 o'clock)", "B. leftward — perpendicular left (9 o'clock)", "C. rightward — perpendicular right (3 o'clock)", "D. backward — toward ego (6 o'clock)" ]
A
q0018
pick_closer
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <2>", "B. both are equidistant", "C. <6>", "D. cannot be determined" ]
A
q0019
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0020
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0021
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. forward — same direction as ego (12 o'clock)", "B. rightward — perpendicular right (3 o'clock)", "C. leftward — perpendicular left (9 o'clock)", "D. backward — toward ego (6 o'clock)" ]
D
q0022
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0024
identify_type
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. car", "B. pedestrian", "C. suv", "D. light_truck" ]
C
q0025
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0028
relative_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. cannot be determined", "B. perpendicular to each other", "C. roughly opposite directions", "D. roughly the same direction" ]
C
q0029
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0031
relative_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. roughly the same direction", "B. perpendicular to each other", "C. cannot be determined", "D. roughly opposite directions" ]
D
q0033
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0034
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. backward — toward ego (6 o'clock)", "B. leftward — perpendicular left (9 o'clock)", "C. forward — same direction as ego (12 o'clock)", "D. rightward — perpendicular right (3 o'clock)" ]
B
q0038
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. leftward — perpendicular left (9 o'clock)", "B. forward — same direction as ego (12 o'clock)", "C. rightward — perpendicular right (3 o'clock)", "D. backward — toward ego (6 o'clock)" ]
A
q0040
order_closest
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <22>, <11>, <8>, <9>", "B. <11>, <8>, <9>, <22>", "C. <11>, <9>, <22>, <8>", "D. <8>, <9>, <11>, <22>" ]
D
q0044
pick_closer
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. both are equidistant", "B. <2>", "C. cannot be determined", "D. <3>" ]
B
q0045
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0048
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. leftward — perpendicular left (9 o'clock)", "B. backward — toward ego (6 o'clock)", "C. rightward — perpendicular right (3 o'clock)", "D. forward — same direction as ego (12 o'clock)" ]
D
q0049
relative_distance
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Very close (0-2m)", "B. Close (2-10m)", "C. Medium (10-30m)", "D. Far (30m+)" ]
C
q0050
relative_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. roughly opposite directions", "B. roughly the same direction", "C. cannot be determined", "D. perpendicular to each other" ]
A
q0052
embodied_collision
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Yes", "B. No", "C. cannot be determined", "D. only if the object moves" ]
A
q0053
identify_frontmost
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <12>", "B. <6>", "C. <14>", "D. <8>" ]
C
q0055
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0056
relative_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. perpendicular to each other", "B. roughly the same direction", "C. cannot be determined", "D. roughly opposite directions" ]
D
q0060
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. forward — same direction as ego (12 o'clock)", "B. backward — toward ego (6 o'clock)", "C. leftward — perpendicular left (9 o'clock)", "D. rightward — perpendicular right (3 o'clock)" ]
D
q0066
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0069
order_leftmost
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <17>, <3>, <27>, <9>", "B. <9>, <17>, <27>, <3>", "C. <9>, <3>, <27>, <17>", "D. <17>, <9>, <27>, <3>" ]
D
q0070
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0072
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. leftward — perpendicular left (9 o'clock)", "B. forward — same direction as ego (12 o'clock)", "C. backward — toward ego (6 o'clock)", "D. rightward — perpendicular right (3 o'clock)" ]
B
q0073
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0075
identify_closest
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <2>", "B. <4>", "C. <1>", "D. <3>" ]
C
q0080
relative_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. cannot be determined", "B. perpendicular to each other", "C. roughly the same direction", "D. roughly opposite directions" ]
D
q0081
order_closest
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <2>, <1>, <3>, <4>", "B. <4>, <1>, <2>, <3>", "C. <3>, <4>, <1>, <2>", "D. <1>, <2>, <3>, <4>" ]
D
q0083
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
A
q0084
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. forward — same direction as ego (12 o'clock)", "B. leftward — perpendicular left (9 o'clock)", "C. backward — toward ego (6 o'clock)", "D. rightward — perpendicular right (3 o'clock)" ]
D
q0086
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
D
q0087
identify_frontmost
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. <1>", "B. <4>", "C. <3>", "D. <6>" ]
D
q0088
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0094
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0095
identify_position
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. nearby", "B. ahead-left", "C. behind-right", "D. ahead" ]
B
q0096
relative_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
C
q0098
identify_distance_long
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. Close (0-20m)", "B. Medium (20-50m)", "C. Far (50-80m)", "D. Very far (80m+)" ]
B
q0100
identify_heading
You are answering a 3D spatial-reasoning question from a SINGLE monocular driving image with numbered bounding boxes. Frontier models routinely get these right by luck while reasoning incorrectly: they lean on flat-image shortcuts ('lower in the frame = closer', 'bigger box = nearer', 'left in the image = to my left') ...
[ "A. backward — toward ego (6 o'clock)", "B. rightward — perpendicular right (3 o'clock)", "C. forward — same direction as ego (12 o'clock)", "D. leftward — perpendicular left (9 o'clock)" ]
C

Open Spatial Reasoning

A multiple-choice dataset of spatial reasoning questions and answers for evaluating 3D spatial reasoning from single driving images. Each image contains numbered bounding boxes referencing objects in the scene, and each question probes a model's ability to reconstruct the real 3D scene rather than rely on flat-image shortcuts (e.g. "lower in the frame = closer", "bigger box = nearer").

Dataset Description

Frontier vision-language models often answer these questions correctly by luck while reasoning incorrectly, leaning on pixel-layout heuristics that break down on elevated roads, slopes, curves, and intersections. This dataset is designed to surface that failure mode by requiring metric 3D reasoning about distance, lateral position, ordering, and heading.

Each sample pairs a driving-scene image with a question, four answer choices, and the correct answer letter.

The images were collected by autonomous vehicles operated by PlusAI.

Data Fields

Field Type Description
id string Unique question identifier (e.g. q0002)
image image The driving image with numbered bounding boxes
category string The reasoning task type (see categories below)
question string The full question, including the reasoning protocol
choices list[string] Four answer options, prefixed A.D.
answer string The correct answer letter (A, B, C, or D)

Question Categories

The dataset spans several spatial-reasoning task types, including:

Category What it tests
identify_distance_long Estimate the absolute distance to an object (binned 0–20m / 20–50m / 50–80m / 80m+)
relative_distance_long Estimate the 3D separation between two objects
pick_closer Decide which of two objects is closer to the ego vehicle
identify_rightmost Identify the object furthest to the right in true 3D space
order_leftmost Order several objects left-to-right in 3D space
identify_position Classify an object's position relative to ego (e.g. ahead-left, behind-right)
identify_heading Determine an object's heading using clock directions (12 = forward, 3 = right)

Authors

Anurag Ganguli, Anshuman Lall, Abhishek Bhatia, Xiangyu Gao, Joe Yuan, Satish Vutukuru, Geoff Wolfe

Citation

If you use this dataset, please cite it:

@misc{driving_3d_spatial_reasoning,
  title  = {Open Spatial Reasoning},
  author = {Anurag Ganguli, Anshuman Lall, Abhishek Bhatia, Xiangyu Gao, Joe Yuan, Satish Vutukuru, Geoff Wolfe},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/datasets/reasoncore/open-spatial-reasoning}}
}

License

Released under CC BY 4.0. Images were collected by autonomous vehicles operated by PlusAI.

Downloads last month
30