Upload folder using huggingface_hub

20572f4 verified 7 months ago

3.75 kB

	# DA-2 WebGPU Port

	This repository contains a port of the DA-2 (Depth Anything in Any Direction) model to run entirely in the browser using WebGPU and ONNX Runtime.

	The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.

	## 🔗 Original Work

	DA<sup>2</sup>: Depth Anything in Any Direction

	* Repository: [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
	* Paper: [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
	* Project Page: [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)

	Please cite the original paper if you use this work:

	```bibtex
	@article{li2025da2,
	title={DA2: Depth Anything in Any Direction},
	author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
	journal={arXiv preprint arXiv:2509.26618},
	year={2025}
	}
	```

	## 🚀 WebGPU Demo

	This project includes a web-based demo that runs the model directly in your browser.

	### Prerequisites

	* Python 3.10+ (for model export)
	* Web Browser with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).

	### Installation

	1. Clone the repository:
	```bash
	git clone <your-repo-url>
	cd DA-2-Web
	```

	2. Set up Python environment:
	```bash
	python3 -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate
	pip install -r requirements.txt
	```

	### Model Preparation

	To run the demo, you first need to convert the PyTorch model to ONNX format.

	1. Download the model weights:
	Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.

	2. Export to ONNX:
	Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
	```bash
	python export_onnx.py
	```
	This will generate `da2_model.onnx`.

	3. Merge ONNX files:
	The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
	```bash
	python merge_onnx.py
	```
	This will generate `da2_model_single.onnx`.

	### Running the Demo

	1. Start a local web server:
	You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
	```bash
	python3 -m http.server 8000
	```

	2. Open in Browser:
	Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.

	3. Usage:
	* Click "Choose File" to upload a panoramic image.
	* Click "Run Inference" to generate the depth map.
	* The process runs entirely locally on your GPU.

	## 🛠️ Technical Details of the Port

	* Precision: The model was converted to FP16 (Half Precision) to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
	* Opset: Exported using ONNX Opset 17.
	* Modifications:
	* The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
	* `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
	* Sphere embeddings are pre-calculated and cast to FP16 within the model graph.

	## 📄 License

	This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.