Image-Text-to-Text
Transformers
nielsr HF Staff commited on
Commit
1e6f934
·
verified ·
1 Parent(s): cc05397

Improve model card with metadata, links, and description

Browse files

This PR improves the model card by:

- Adding metadata: `pipeline_tag`, `library_name`, and confirming the `license`.
- Adding a link to the paper.
- Restructuring the content for better readability, providing a concise overview followed by detailed instructions.

Files changed (1) hide show
  1. README.md +43 -3
README.md CHANGED
@@ -1,3 +1,43 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ # MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
8
+
9
+ <img align="right" src="figure.jpg" alt="teaser" width="100%" style="margin-left: 10px">
10
+
11
+ This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.
12
+
13
+ Paper: https://arxiv.org/abs/2503.02579
14
+
15
+ Code: https://github.com/egeozsoy/MM-OR
16
+
17
+
18
+ **Authors**: [Ege Özsoy][eo], Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, [Nassir Navab][nassir]
19
+
20
+ [eo]: https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
21
+ [nassir]: https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
22
+
23
+
24
+ ## MM-OR Dataset
25
+ - To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the
26
+ dataset.
27
+ - You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
28
+ - Put the newly created MM-OR_data folder into the root directory of this project.
29
+ - Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.
30
+
31
+ ## Panoptic Segmentation and Scene Graph Generation Instructions
32
+ Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within `panoptic_segmentation` and `scene_graph_generation` for specific instructions and requirements.
33
+
34
+
35
+ ```bibtex
36
+ @inproceedings{ozsoy2024mmor,
37
+ title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
38
+ author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
39
+ booktitle={CVPR},
40
+ note={Accepted},
41
+ year={2025}
42
+ }
43
+ ```