Image-Text-to-Text
Transformers
Safetensors
Cosmos
English
qwen2_5_vl
nvidia
conversational
text-generation-inference
harrim-nv commited on
Commit
6554ea3
·
verified ·
1 Parent(s): b0b77be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -204,7 +204,7 @@ We value you, the datasets, the diversity they represent, and what we have been
204
  | Field | Response |
205
  | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
206
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
207
- | Measures taken to mitigate against unwanted bias: | None |
208
 
209
  ### Explainability
210
 
@@ -215,7 +215,7 @@ We value you, the datasets, the diversity they represent, and what we have been
215
  | Intended Users: | Physical AI developers |
216
  | Output: | Text |
217
  | Describe how the model works: | Generates text answers based on input text prompt and video |
218
- | Technical Limitations: | The model may not follow the video or text input accurately in challenging cases, where the input video shows complex scene composition and temporal dynamics. |
219
  | Verified to have met prescribed NVIDIA quality standards: | Yes |
220
  | Performance Metrics: | Quantitative and Qualitative Evaluation. Cosmos-Reason1 proposes the embodied reasoning benchmark and physical common sense benchmark to evaluate accuracy with visual question answering. |
221
  | Potential Known Risks: | The model's output can generate all forms of texts, including what may be considered toxic, offensive, or indecent. |
 
204
  | Field | Response |
205
  | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
206
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
207
+ | Measures taken to mitigate against unwanted bias: | The training video sources contain multiple physical embodiments and environments including human, car, single arm robot, bimanual robot in indoor and outdoor environments. By training on numerous and various physical interactions and curated datasets, we strive to provide a model that does not possess biases towards certain embodiments or environments. |
208
 
209
  ### Explainability
210
 
 
215
  | Intended Users: | Physical AI developers |
216
  | Output: | Text |
217
  | Describe how the model works: | Generates text answers based on input text prompt and video |
218
+ | Technical Limitations: | The model may not follow the video or text input accurately in challenging cases, where the input video shows complex scene composition and temporal dynamics. Examples of challenging scenes include: fast camera movements, overlapping human-object interactions, low lighting with high motion blur, and multiple people performing different actions simultaneously. |
219
  | Verified to have met prescribed NVIDIA quality standards: | Yes |
220
  | Performance Metrics: | Quantitative and Qualitative Evaluation. Cosmos-Reason1 proposes the embodied reasoning benchmark and physical common sense benchmark to evaluate accuracy with visual question answering. |
221
  | Potential Known Risks: | The model's output can generate all forms of texts, including what may be considered toxic, offensive, or indecent. |