Improve model card with metadata, links, and description
Browse filesThis PR improves the model card by:
- Adding metadata: `pipeline_tag`, `library_name`, and confirming the `license`.
- Adding a link to the paper.
- Restructuring the content for better readability, providing a concise overview followed by detailed instructions.
README.md
CHANGED
|
@@ -1,3 +1,43 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
|
| 8 |
+
|
| 9 |
+
<img align="right" src="figure.jpg" alt="teaser" width="100%" style="margin-left: 10px">
|
| 10 |
+
|
| 11 |
+
This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.
|
| 12 |
+
|
| 13 |
+
Paper: https://arxiv.org/abs/2503.02579
|
| 14 |
+
|
| 15 |
+
Code: https://github.com/egeozsoy/MM-OR
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
**Authors**: [Ege Özsoy][eo], Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, [Nassir Navab][nassir]
|
| 19 |
+
|
| 20 |
+
[eo]: https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
|
| 21 |
+
[nassir]: https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## MM-OR Dataset
|
| 25 |
+
- To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the
|
| 26 |
+
dataset.
|
| 27 |
+
- You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
|
| 28 |
+
- Put the newly created MM-OR_data folder into the root directory of this project.
|
| 29 |
+
- Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.
|
| 30 |
+
|
| 31 |
+
## Panoptic Segmentation and Scene Graph Generation Instructions
|
| 32 |
+
Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within `panoptic_segmentation` and `scene_graph_generation` for specific instructions and requirements.
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
```bibtex
|
| 36 |
+
@inproceedings{ozsoy2024mmor,
|
| 37 |
+
title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
|
| 38 |
+
author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
|
| 39 |
+
booktitle={CVPR},
|
| 40 |
+
note={Accepted},
|
| 41 |
+
year={2025}
|
| 42 |
+
}
|
| 43 |
+
```
|