Improve model card with metadata, links, and description

This PR improves the model card by:

- Adding metadata: `pipeline_tag`, `library_name`, and confirming the `license`.
- Adding a link to the paper.
- Restructuring the content for better readability, providing a concise overview followed by detailed instructions.

Files changed (1) hide show

README.md +43 -3

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
+<img align="right" src="figure.jpg" alt="teaser" width="100%" style="margin-left: 10px">
+This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.
+Paper: https://arxiv.org/abs/2503.02579
+Code: https://github.com/egeozsoy/MM-OR
+**Authors**: [Ege Özsoy][eo], Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, [Nassir Navab][nassir]
+[eo]: https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
+[nassir]: https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
+## MM-OR Dataset
+- To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the
+  dataset.
+- You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
+- Put the newly created MM-OR_data folder into the root directory of this project.
+- Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.
+## Panoptic Segmentation and Scene Graph Generation Instructions
+Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository.  Please refer to the README files within `panoptic_segmentation` and `scene_graph_generation` for specific instructions and requirements.
+```bibtex
+@inproceedings{ozsoy2024mmor,
+  title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
+  author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
+  booktitle={CVPR},
+  note={Accepted},
+  year={2025}
+}
+```