| # DA-2 WebGPU Port |
|
|
| This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**. |
|
|
| The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference. |
|
|
| ## π Original Work |
|
|
| **DA<sup>2</sup>: Depth Anything in Any Direction** |
|
|
| * **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2) |
| * **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618) |
| * **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/) |
|
|
| Please cite the original paper if you use this work: |
|
|
| ```bibtex |
| @article{li2025da2, |
| title={DA2: Depth Anything in Any Direction}, |
| author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao}, |
| journal={arXiv preprint arXiv:2509.26618}, |
| year={2025} |
| } |
| ``` |
|
|
| ## π WebGPU Demo |
|
|
| This project includes a web-based demo that runs the model directly in your browser. |
|
|
| ### Prerequisites |
|
|
| * **Python 3.10+** (for model export) |
| * **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly). |
|
|
| ### Installation |
|
|
| 1. **Clone the repository:** |
| ```bash |
| git clone <your-repo-url> |
| cd DA-2-Web |
| ``` |
| |
| 2. **Set up Python environment:** |
| ```bash |
| python3 -m venv venv |
| source venv/bin/activate # On Windows: venv\Scripts\activate |
| pip install -r requirements.txt |
| ``` |
| |
| ### Model Preparation |
|
|
| To run the demo, you first need to convert the PyTorch model to ONNX format. |
|
|
| 1. **Download the model weights:** |
| Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project. |
| |
| 2. **Export to ONNX:** |
| Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`). |
| ```bash |
| python export_onnx.py |
| ``` |
| This will generate `da2_model.onnx`. |
| |
| 3. **Merge ONNX files:** |
| The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading. |
| ```bash |
| python merge_onnx.py |
| ``` |
| This will generate `da2_model_single.onnx`. |
| |
| ### Running the Demo |
|
|
| 1. **Start a local web server:** |
| You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context. |
| ```bash |
| python3 -m http.server 8000 |
| ``` |
| |
| 2. **Open in Browser:** |
| Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser. |
| |
| 3. **Usage:** |
| * Click "Choose File" to upload a panoramic image. |
| * Click "Run Inference" to generate the depth map. |
| * The process runs entirely locally on your GPU. |
|
|
| ## π οΈ Technical Details of the Port |
|
|
| * **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs. |
| * **Opset:** Exported using **ONNX Opset 17**. |
| * **Modifications:** |
| * The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility. |
| * `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs. |
| * Sphere embeddings are pre-calculated and cast to FP16 within the model graph. |
|
|
| ## π License |
|
|
| This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details. |
|
|