Update README.md
Browse files
README.md
CHANGED
@@ -204,7 +204,7 @@ We value you, the datasets, the diversity they represent, and what we have been
|
|
204 |
| Field | Response |
|
205 |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
|
206 |
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
|
207 |
-
| Measures taken to mitigate against unwanted bias: |
|
208 |
|
209 |
### Explainability
|
210 |
|
@@ -215,7 +215,7 @@ We value you, the datasets, the diversity they represent, and what we have been
|
|
215 |
| Intended Users: | Physical AI developers |
|
216 |
| Output: | Text |
|
217 |
| Describe how the model works: | Generates text answers based on input text prompt and video |
|
218 |
-
| Technical Limitations: | The model may not follow the video or text input accurately in challenging cases, where the input video shows complex scene composition and temporal dynamics.
|
219 |
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
220 |
| Performance Metrics: | Quantitative and Qualitative Evaluation. Cosmos-Reason1 proposes the embodied reasoning benchmark and physical common sense benchmark to evaluate accuracy with visual question answering. |
|
221 |
| Potential Known Risks: | The model's output can generate all forms of texts, including what may be considered toxic, offensive, or indecent. |
|
|
|
204 |
| Field | Response |
|
205 |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
|
206 |
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
|
207 |
+
| Measures taken to mitigate against unwanted bias: | The training video sources contain multiple physical embodiments and environments including human, car, single arm robot, bimanual robot in indoor and outdoor environments. By training on numerous and various physical interactions and curated datasets, we strive to provide a model that does not possess biases towards certain embodiments or environments. |
|
208 |
|
209 |
### Explainability
|
210 |
|
|
|
215 |
| Intended Users: | Physical AI developers |
|
216 |
| Output: | Text |
|
217 |
| Describe how the model works: | Generates text answers based on input text prompt and video |
|
218 |
+
| Technical Limitations: | The model may not follow the video or text input accurately in challenging cases, where the input video shows complex scene composition and temporal dynamics. Examples of challenging scenes include: fast camera movements, overlapping human-object interactions, low lighting with high motion blur, and multiple people performing different actions simultaneously. |
|
219 |
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
220 |
| Performance Metrics: | Quantitative and Qualitative Evaluation. Cosmos-Reason1 proposes the embodied reasoning benchmark and physical common sense benchmark to evaluate accuracy with visual question answering. |
|
221 |
| Potential Known Risks: | The model's output can generate all forms of texts, including what may be considered toxic, offensive, or indecent. |
|