temnick commited on
Commit
f724cf3
·
1 Parent(s): 1c86d18

Initial content

Browse files
.gitattributes CHANGED
@@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ *.dds filter=lfs diff=lfs merge=lfs -text
38
+ *.vgf filter=lfs diff=lfs merge=lfs -text
39
+ *.dll filter=lfs diff=lfs merge=lfs -text
40
+ *.exe filter=lfs diff=lfs merge=lfs -text
41
+ *.spv filter=lfs diff=lfs merge=lfs -text
42
+ *.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ out/
2
+ bin/linux-x86_64/
Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf ADDED
Binary file (49.6 kB). View file
 
LICENSE ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ The license for the model source code can be found at https://github.com/arm/neural-graphics-model-gym/blob/main/LICENSES/Apache-2.0.txt
2
+ The license for the content of this repository can be found in Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ pipeline_tag: image-to-image
4
+ tags:
5
+ - android
6
+ - neural-graphics
7
+ - gaming
8
+ - graphics
9
+ language:
10
+ - en
11
+ ---
12
+
13
+ # Neural Super Sampling (NSS)
14
+
15
+ Neural Super Sampling (NSS) is an innovative, efficient network for temporal super sampling on mobile devices. Content rendered at 540p can be upscaled to 1080p, resulting in up to 50% GPU savings. With our retraining tools content creators and game studios can build derivatives of the model suited to artwork style and performance requirements.
16
+
17
+ ### 🎥 Neural Super Sampling Demo
18
+
19
+ <video controls width="100%">
20
+ <source src="https://huggingface.co/Arm/neural-super-sampling/resolve/main/resources/Enchanted_Castle_NSS_Demo.mp4" type="video/mp4">
21
+ Your browser does not support the video tag.
22
+ </video>
23
+
24
+ ## Model Details
25
+
26
+ Neural Super Sampling (NSS) is a parameter prediction model for real-time temporal super sampling developed by Arm, optimized for execution on Neural Accelerators (NX) in mobile GPUs. It enables high-resolution rendering at a lower compute cost by reconstructing high-quality output frames from low-resolution temporal inputs. NSS is particularly suited for mobile gaming, XR, and other power-constrained graphics use cases.
27
+
28
+ - **Developed by:** Arm Limited
29
+ - **Model type:** Temporal image super sampling
30
+ - **License:** Other
31
+ - **Repository:** [Neural Graphics Model Gym](https://github.com/arm/neural-graphics-model-gym)
32
+ - **Paper:** [How Neural Super Sampling Works](https://community.arm.com/arm-community-blogs/b/mobile-graphics-and-gaming-blog/posts/how-arm-neural-super-sampling-works)
33
+ - **Quickstart with ML extensions for Vulkan®**: [ML extensions for Vulkan® Quickstart Guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/)
34
+ - **Quickstart with Unreal**: [Neural Super Sampling Quickstart Guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/nss-unreal/) for NSS integration into Unreal Engine
35
+
36
+ NSS is under active development with regular updates planned. It should not be considered production-grade at this stage. As we increase the size and diversity of the training dataset we expect to see significant quality improvements. Follow Arm to stay up to date on the latest releases.
37
+
38
+ The model is released under Arm's [AI Model Community License](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf) which allows NSS to be retrained on datasets captured from your own content. Future releases of the [Neural Graphics Model Gym](https://github.com/arm/neural-graphics-model-gym) will provide the tools to capture and convert content for use in (re)retraining..
39
+
40
+ ## Uses
41
+
42
+ NSS can be directly integrated into graphics pipelines using ML extensions for Vulkan®. See included ML SDK for Vulkan [scenario](https://huggingface.co/Arm/neural-super-sampling/tree/main/scenario) for the simplest way to evaluate the model. The scenario includes the necessary pre- and post-processing compute shaders along with a single frame worth of input data.
43
+
44
+ The recommended way of integrating the model into a graphics pipeline is by using the [VGF Library](https://github.com/arm/ai-ml-sdk-vgf-library/tree/main) from the ML SDK for Vulkan.
45
+
46
+ NSS is released under a [permissive license](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf) designed to foster innovation in the graphics industry and provide differentiation to content creators.
47
+
48
+ ### Direct Use
49
+
50
+ NSS has been integrated into Unreal Engine via two plugins, the [NSS Plugin for Unreal Engine](https://github.com/arm/neural-graphics-for-unreal/) and [Unreal NNE Plugin for ML extensions for Vulkan](https://github.com/arm/ml-extensions-for-vulkan-unreal-plugin/). See our [quick start guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/nss-unreal/) for step-by-step instructions on how to use NSS in Unreal® Engine.
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ - Not suited for non-temporal tasks such as a standalone image upsampling
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ - Requires accurate motion vectors and frame history for stable output
59
+ - May underperform in extremely low framerate scenarios (<10 FPS) with fast camera movement
60
+ - Padding of the input is needed if input dimensions are not divisible by 8
61
+
62
+ ### Recommendations
63
+
64
+ For ultra-low-FPS use cases, reduce the camera speed, acceleration, or both so that the relative motion between frames mimics the
65
+ application running at a higher frame rate.
66
+
67
+ ## How to Get Started with the Model
68
+
69
+ This repository contains pre-trained weights and compiled NSS model in VGF format ready for integration with Vulkan applications.
70
+
71
+ The included Scenario demonstrates full execution of the model on a Vulkan compute-capable system. An Emulation Layer is provided to implement ML Extensions for Vulkan where it is not supported by the native Vulkan driver.
72
+
73
+ ### Download and Prepare the Scenario
74
+
75
+ Clone the NSS model repository from Hugging Face
76
+ ```powershell
77
+ git clone https://huggingface.co/Arm/TestNSS
78
+ cd TestNSS
79
+ ```
80
+
81
+ ### Run the Scenario
82
+
83
+ The NSS Hugging Face repository includes pre-built Windows® binaries for ML Emulation Layer for Vulkan and Scenario Runner. For other platforms,
84
+ - build from source following the instructions for [Building the Emulation Layer from source](https://github.com/arm/ai-ml-emulation-layer-for-vulkan/blob/main/README.md#building-the-emulation-layer-from-source) and [Building the Scenario Runner from source](https://github.com/arm/ai-ml-sdk-scenario-runner/blob/main/README.md#building-scenario-runner-from-source)
85
+ - adapt the instructions below accordingly
86
+
87
+ 1. Set the required environment variables:
88
+
89
+ On Windows:
90
+ ```powershell
91
+ $env:VK_LAYER_PATH="$PWD\bin\windows-x86_64"
92
+ $env:VK_INSTANCE_LAYERS="VK_LAYER_ML_Graph_Emulation;VK_LAYER_ML_Tensor_Emulation"
93
+ ```
94
+
95
+ On Linux (assuming the Emulation Layer binaries and JSON files and Scenario Runner executable are copied to `bin/linux-x86_64`):
96
+ ```powershell
97
+ export LD_LIBRARY_PATH=$PWD/bin/linux-x86_64:$LD_LIBRARY_PATH
98
+ export VK_LAYER_PATH=$PWD/bin/linux-x86_64
99
+ export VK_INSTANCE_LAYERS=VK_LAYER_ML_Graph_Emulation:VK_LAYER_ML_Tensor_Emulation
100
+ ```
101
+
102
+ 2. Execute the scenario:
103
+
104
+ On Windows:
105
+ ```powershell
106
+ bin\windows-x86_64\scenario-runner.exe --scenario scenario\scenario.json --output out
107
+ ```
108
+
109
+ On Linux:
110
+ ```powershell
111
+ bin/linux-x86_64/scenario-runner --scenario scenario/scenario.json --output out
112
+ ```
113
+
114
+ - Output images are encoded as `B10G11R11_UFLOAT`. This format is common for framebuffers but not widely supported by image viewers. Use [RenderDoc](https://renderdoc.org/) to view these images.
115
+
116
+ ## Training and Evaluation
117
+
118
+ For background on NSS architecture and training read our blog: [How Neural Super Sampling Works](https://community.arm.com/arm-community-blogs/b/mobile-graphics-and-gaming-blog/posts/how-arm-neural-super-sampling-works)
119
+
120
+ Training and evaluation details, including model architecture code, training pipeline, and test configurations, are available at:
121
+
122
+ - Model training code: https://github.com/arm/neural-graphics-model-gym
123
+ - Examples and tutorials: https://github.com/arm/neural-graphics-model-gym-examples
124
+ - Sample dataset: https://huggingface.co/datasets/Arm/neural-graphics-dataset
125
+
126
+ ### 🔎 Model Explorer VGF extension
127
+
128
+ The [VGF extension to Model Explorer](https://github.com/arm/vgf-adapter-model-explorer) provides a simple interface to visualize model and analyse VGF composition.
129
+
130
+ ![Model Explorer screenshot](resources/model-explorer-screenshot.png)
131
+
132
+ ## License
133
+
134
+ - The license for the model source code can be found [here](https://github.com/arm/neural-graphics-model-gym/blob/main/LICENSES/Apache-2.0.txt).
135
+ - The license for the content of this repository can be found [here](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf)
136
+
137
+ ## More Information
138
+
139
+ 🧑‍🔬 More technical details about the model can be found in the [NSS Guide](https://developer.arm.com/documentation/111009/latest/).
140
+
141
+ 👩🏽‍💻 Our [Neural Graphics Development Kit](https://developer.arm.com/mobile-graphics-and-gaming/neural-graphics) contains engine plugins, model training tools, code examples and extensive developer documentation.
142
+
143
+ 🙋🏻‍♀️ For questions or feedback please [start a discussion](https://huggingface.co/Arm/neural-super-sampling/discussions)
144
+
145
+ ## Trademark notice
146
+ Arm® is a registered trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
147
+
148
+ Windows® is a trademark of the Microsoft group of companies.
149
+
150
+ Vulkan® is a registered trademark of the [Khronos® Group](https://www.khronos.org/legal/trademarks).
bin/windows-x86_64/VkLayer_Graph.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a2fd54f62bef850685bcc4331714e0590a0a3aef28ea2dd6aa8c9e6f68f4da0
3
+ size 7437312
bin/windows-x86_64/VkLayer_Graph.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "file_format_version": "1.0.0",
3
+ "layer": {
4
+ "name": "VK_LAYER_ML_Graph_Emulation",
5
+ "type": "INSTANCE",
6
+ "library_path": ".\\VkLayer_Graph.dll",
7
+ "api_version": "1.3.0",
8
+ "implementation_version": "1",
9
+ "description": "ML Graph Emulation Layer",
10
+ "functions": {
11
+ "vkGetInstanceProcAddr": "graphGetInstanceProcAddr",
12
+ "vkGetDeviceProcAddr": "graphGetDeviceProcAddr"
13
+ },
14
+ "device_extensions": [
15
+ {
16
+ "name": "VK_ARM_data_graph",
17
+ "spec_version": "1",
18
+ "entrypoints": [
19
+ "vkGetPhysicalDeviceGraphInstructionSetsARM",
20
+ "vkCreateGraphPipelinesARM",
21
+ "vkCreateGraphPipelineSessionARM",
22
+ "vkGetGraphPipelineSessionMemoryRequirementsARM",
23
+ "vkBindGraphPipelineSessionMemoryARM",
24
+ "vkDestroyGraphPipelineSessionARM",
25
+ "vkCmdDispatchGraphARM"
26
+ ]
27
+ }]
28
+ }
29
+ }
bin/windows-x86_64/VkLayer_Tensor.dll ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15a7e7f9b4ff74f530a550669740f1c8b5295bdbe9a0474da3e3b9e906c4ce76
3
+ size 5689344
bin/windows-x86_64/VkLayer_Tensor.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "file_format_version": "1.0.0",
3
+ "layer": {
4
+ "name": "VK_LAYER_ML_Tensor_Emulation",
5
+ "type": "INSTANCE",
6
+ "library_path": ".\\VkLayer_Tensor.dll",
7
+ "api_version": "1.3.0",
8
+ "implementation_version": "1",
9
+ "description": "ML Tensor Emulation Layer",
10
+ "functions": {
11
+ "vkGetInstanceProcAddr": "tensorGetInstanceProcAddr",
12
+ "vkGetDeviceProcAddr": "tensorGetDeviceProcAddr"
13
+ },
14
+ "device_extensions": [
15
+ {
16
+ "name": "VK_ARM_tensors",
17
+ "spec_version": "1",
18
+ "entrypoints": [
19
+ "vkCreateTensorARM",
20
+ "vkDestroyTensorARM",
21
+ "vkCreateTensorViewARM",
22
+ "vkDestroyTensorViewARM",
23
+ "vkGetTensorMemoryRequirementsARM",
24
+ "vkBindTensorMemoryARM",
25
+ "vkGetDeviceTensorMemoryRequirementsARM",
26
+ "vkCmdCopyTensorARM"
27
+ ]
28
+ }
29
+ ]
30
+ }
31
+ }
bin/windows-x86_64/scenario-runner.exe ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a99f1076dec1f504d64387969a3d18dc687c7f95c505dca812e4fcbca60f3d2
3
+ size 5285376
nss_v0.1.0_fp32.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:391ad31f72783175afcb94180e0d8ffad7a34a6d848edfdea3409677236fc1da
3
+ size 553364
nss_v0.1.0_int8.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57ebcd81596ca7720015fc42b8c3d509c45d0c031244b024c7d1056b671dce9d
3
+ size 665897
nss_v0.1.0_int8.vgf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2bb554b54f186111150cbe3f80b258300d22e6e6e23610a6c519abe1962d8f9
3
+ size 162900
nss_v0.1.0_int8_metadata.json ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dm_scale_on_no_motion": [
3
+ 0.617464542388916
4
+ ],
5
+ "inputs": {
6
+ "x": {
7
+ "SINT": {
8
+ "scale": 0.003921568859368563,
9
+ "zero_point": -128
10
+ },
11
+ "SNORM": {
12
+ "scale": 0.49803924513980746,
13
+ "zero_point": -1.0078740157480315
14
+ }
15
+ }
16
+ },
17
+ "outputs": {
18
+ "activation_post_process_45": {
19
+ "SINT": {
20
+ "scale": 0.003937007859349251,
21
+ "zero_point": -127
22
+ },
23
+ "SNORM": {
24
+ "scale": 0.49999999813735485,
25
+ "zero_point": -1.0
26
+ }
27
+ },
28
+ "activation_post_process_50": {
29
+ "SINT": {
30
+ "scale": 0.003937007859349251,
31
+ "zero_point": -127
32
+ },
33
+ "SNORM": {
34
+ "scale": 0.49999999813735485,
35
+ "zero_point": -1.0
36
+ }
37
+ },
38
+ "activation_post_process_55": {
39
+ "SINT": {
40
+ "scale": 0.003937007859349251,
41
+ "zero_point": -127
42
+ },
43
+ "SNORM": {
44
+ "scale": 0.49999999813735485,
45
+ "zero_point": -1.0
46
+ }
47
+ },
48
+ "activation_post_process_60": {
49
+ "SINT": {
50
+ "scale": 0.003937007859349251,
51
+ "zero_point": -127
52
+ },
53
+ "SNORM": {
54
+ "scale": 0.49999999813735485,
55
+ "zero_point": -1.0
56
+ }
57
+ },
58
+ "activation_post_process_65": {
59
+ "SINT": {
60
+ "scale": 0.003937007859349251,
61
+ "zero_point": -127
62
+ },
63
+ "SNORM": {
64
+ "scale": 0.49999999813735485,
65
+ "zero_point": -1.0
66
+ }
67
+ },
68
+ "activation_post_process_70": {
69
+ "SINT": {
70
+ "scale": 0.003937007859349251,
71
+ "zero_point": -127
72
+ },
73
+ "SNORM": {
74
+ "scale": 0.49999999813735485,
75
+ "zero_point": -1.0
76
+ }
77
+ }
78
+ }
79
+ }
resources/Enchanted_Castle_NSS_Demo.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13cc07b5829e7335b94b548314dd189add180ae4b6fd4d1529db37de72a9c3d8
3
+ size 96767057
resources/model-explorer-screenshot.png ADDED

Git LFS Details

  • SHA256: 38bd4531f68626059c16e656f4d6a2df017ef074c26b3bce71b68405539f63da
  • Pointer size: 131 Bytes
  • Size of remote file: 153 kB
scenario/0_pre_process.comp ADDED
@@ -0,0 +1,572 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ //
2
+ // -----------------------------------------------------------------------------
3
+ // The proprietary software and information contained in this file is
4
+ // confidential and may only be used by an authorized person under a valid
5
+ // licensing agreement from Arm Limited or its affiliates.
6
+ //
7
+ // Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
8
+ //
9
+ // This entire notice must be reproduced on all copies of this file and
10
+ // copies of this file may only be made by an authorized person under a valid
11
+ // licensing agreement from Arm Limited or its affiliates.
12
+ // -----------------------------------------------------------------------------
13
+ //
14
+ #version 460
15
+ #extension GL_EXT_shader_8bit_storage : require
16
+ #extension GL_EXT_shader_16bit_storage : require
17
+ #extension GL_EXT_shader_explicit_arithmetic_types : require
18
+ #extension GL_EXT_shader_explicit_arithmetic_types_int8 : require
19
+ #extension GL_EXT_shader_explicit_arithmetic_types_float16 : require
20
+ #extension GL_EXT_shader_explicit_arithmetic_types_float32 : require
21
+ #extension GL_GOOGLE_include_directive : enable
22
+ #extension GL_ARM_tensors : require
23
+
24
+ // includes
25
+ #include "typedefs.h"
26
+ #include "common.h"
27
+
28
+ // types
29
+
30
+ struct TensorElement
31
+ {
32
+ int8_t4 wh_rgb_col_r; // warped_history.rgb, jittered_colour.r
33
+ int8_t4 col_gb_dm_fback_r; // jittered_colour.gb, disocclusion mask, feedback.r
34
+ int8_t4 fback_gba_ld; // feedback.gba, luma derivative
35
+ };
36
+
37
+ // inputs
38
+ layout (set=0, binding=0) uniform mediump sampler2D _ColourTex; // 540p | R11G11B10 32bpp
39
+ layout (set=0, binding=1) uniform highp sampler2D _DepthTex; // 540p | R32_FLOAT 32bpp
40
+ layout (set=0, binding=2) uniform mediump sampler2D _MotionVectorTex; // 540p | RG_16 32bpp
41
+ layout (set=0, binding=3) uniform mediump sampler2D _HistoryTex; // 1080p | R11G11B10 32bpp
42
+ layout (set=0, binding=4) uniform lowp sampler2D _FeedbackTensor; // 1080p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
43
+ layout (set=0, binding=5) uniform highp sampler2D _DepthTm1Tex; // 540p | R32_FLOAT 32bpp
44
+ layout (set=0, binding=6) uniform lowp sampler2D _LumaDerivTm1Tex; // 540p | R8G8_UNORM 16bpp
45
+ layout (set=0, binding=7) uniform lowp sampler2D _NearestDepthCoordTm1Tex; // 540p | R8_UNORM 8bpp
46
+
47
+ // outputs
48
+ layout (set=1, binding=0) uniform writeonly tensorARM<int8_t, 4> _PreprocessTensor; // 540p | 12ch 96bpp
49
+ layout (set=1, binding=1, rg8) uniform writeonly lowp image2D _PreProcessLumaDerivOut; // 540p | R8G8 16bpp
50
+ layout (set=1, binding=3, r8) uniform writeonly lowp image2D _NearestDepthCoordOut; // 540p | R8 8bpp
51
+
52
+ // push-constants
53
+ layout(push_constant, std430) uniform PushConstants {
54
+ // ─────────────── 16-byte aligned ───────────────
55
+ layout(offset = 0) float4 _DeviceToViewDepth; // 16 B
56
+ layout(offset = 16) float4 _JitterOffset; // 16 B (.xy = pixels, .zw = uvs)
57
+ layout(offset = 32) float4 _JitterOffsetTm1; // 16 B (.xy = pixels, .zw = uvs)
58
+ layout(offset = 48) float4 _ScaleFactor; // 16 B (.xy = scale, .zw = inv scale)
59
+
60
+ // ─────────────── 8-byte aligned ───────────────
61
+ layout(offset = 64) int32_t2 _OutputDims; // 8 B
62
+ layout(offset = 72) int32_t2 _InputDims; // 8 B
63
+ layout(offset = 80) float2 _InvOutputDims; // 8 B
64
+ layout(offset = 88) float2 _InvInputDims; // 8 B
65
+ layout(offset = 96) half4 _QuantParams; // 8 B (.xy SINT, .zw SNORM)
66
+ layout(offset = 104) half4 _MotionDisThreshPad; // 8 B (.xyzw = motion/disocclusion thresholds)
67
+
68
+ // ─────────────── 4-byte aligned ───────────────
69
+ layout(offset = 112) half2 _Exposure; // 4 B (.x = exposure, .y = 1/exp)
70
+ layout(offset = 116) half2 _HistoryPad; // 4 B
71
+
72
+ // ─────────────── padding to 16-byte struct size ────
73
+ layout(offset = 120) int32_t2 _Padding; // 8 B
74
+
75
+ // Total: **128 bytes**
76
+ };
77
+
78
+ // Convenience mapping for accessing push constants
79
+ #define _Scale _ScaleFactor.xy
80
+ #define _InvScale _ScaleFactor.zw
81
+ #define _Exposure _Exposure.x
82
+ #define _InvExposure _Exposure.y
83
+ #define _JitterOffsetPix _JitterOffset.xy
84
+ #define _JitterOffsetUv _JitterOffset.zw
85
+ #define _JitterOffsetTm1Pix _JitterOffsetTm1.xy
86
+ #define _JitterOffsetTm1Uv _JitterOffsetTm1.zw
87
+ #define _MotionWarpThresh _MotionDisThreshPad.x
88
+ #define _MotionDisThresh _MotionDisThreshPad.y
89
+ #define _DisocclusionScale _MotionDisThreshPad.z
90
+ #define _NotHistoryReset _HistoryPad.x
91
+
92
+ // Quantization Parameters
93
+ // inside: `./parameters.json`
94
+ // these values are embdedded inside the TOSA file and learnt during QAT
95
+
96
+ #ifndef _InputQuantParams
97
+ // inputs - x["SINT"]
98
+ #define _InputQuantParams _QuantParams.xy
99
+ #endif
100
+ #ifndef _FeedbackQuantParams
101
+ // outputs - activation_post_process_70["SNORM"]
102
+ #define _FeedbackQuantParams _QuantParams.zw
103
+ #endif
104
+
105
+ // constants
106
+
107
+ #ifdef INVERTED_DEPTH
108
+ #define MAX_DEPTH 0.f
109
+ #else
110
+ #define MAX_DEPTH 1.f
111
+ #endif
112
+
113
+
114
+ // methods
115
+
116
+ bool IsOnScreen(int32_t2 pos, int32_t2 size)
117
+ {
118
+ return all(lessThan(uint32_t2(pos), uint32_t2(size)));
119
+ }
120
+
121
+
122
+ half2 LoadMotion(int32_t2 pixel)
123
+ {
124
+ return half2(texelFetch(_MotionVectorTex, pixel, 0).rg);
125
+ }
126
+
127
+
128
+ half3 LoadColour(int32_t2 pixel)
129
+ {
130
+ return Tonemap(SafeColour(half3(texelFetch(_ColourTex, pixel, 0).rgb) * _Exposure));
131
+ }
132
+
133
+
134
+ int32_t2 LoadDepthNearestDepthOffsetTm1(int32_t2 pixel)
135
+ {
136
+ int32_t2 is_oob = int32_t2(IsOnScreen(pixel, _InputDims));
137
+ pixel = clamp(pixel, int32_t2(0), _InputDims - int32_t2(1));
138
+
139
+ half encNorm = half(texelFetch(_NearestDepthCoordTm1Tex, pixel, 0).r);
140
+ int32_t code = int32_t(encNorm * 255.0 + 0.5);
141
+
142
+ // 3. map back to {-1,0,1}²
143
+ return DecodeNearestDepthCoord(code) * is_oob;
144
+ }
145
+
146
+ void GatherReconstructedPreviousDepthRQuad(float2 fUV, inout float4 depthQuad)
147
+ {
148
+ int32_t2 offset = LoadDepthNearestDepthOffsetTm1(int32_t2(fUV * _InputDims));
149
+ float2 offset_uv = float2(offset) * _InvInputDims;
150
+ depthQuad = textureGather(_DepthTm1Tex, fUV + offset_uv, 0).wzxy;
151
+ }
152
+
153
+
154
+ half3 WarpHistory(float2 uv)
155
+ {
156
+ return Tonemap(SafeColour(half3(textureLod(_HistoryTex, uv, 0).rgb) * _Exposure));
157
+ }
158
+
159
+
160
+ half4 WarpFeedback(float2 uv)
161
+ {
162
+ return Dequantize(half4(textureLod(_FeedbackTensor, uv, 0)), _FeedbackQuantParams);
163
+ }
164
+
165
+
166
+ half2 WarpLumaDerivative(float2 uv)
167
+ {
168
+ return half2(textureLod(_LumaDerivTm1Tex, uv, 0).rg);
169
+ }
170
+
171
+
172
+ half2 CalculateLumaDerivative(float2 reproj_uv, half3 jittered_colour, half disocclusion_mask)
173
+ {
174
+ const half DIS_THRESH = 0.01HF;
175
+ const half DERIV_MIN = 0.05HF;
176
+ const half DERIV_MAX = 0.3HF;
177
+ const half DERIV_POW = 1.5HF;
178
+ const half DERIV_ALPHA = 0.1HF;
179
+ const half DERIV_MAX_R = rcp(DERIV_MAX);
180
+ const half DERIV_MAX_POW_R = rcp(pow(DERIV_MAX, DERIV_POW));
181
+
182
+ //--------------------------------------------------------------------
183
+ // 1. Fetch history (luma + derivative)
184
+ //--------------------------------------------------------------------
185
+ half2 h = WarpLumaDerivative(reproj_uv);
186
+ half luma_tm1 = h.y;
187
+ half derivative_tm1 = h.x;
188
+
189
+ //--------------------------------------------------------------------
190
+ // 2. Current luma & raw derivative
191
+ //--------------------------------------------------------------------
192
+ half luma_t = Luminance(jittered_colour);
193
+ half derivative_t = abs(luma_t - luma_tm1);
194
+
195
+ //--------------------------------------------------------------------
196
+ // 3. Soft-clip & normalize
197
+ //--------------------------------------------------------------------
198
+ // Clip to `DERIV_MAX` which is ~typical max value,
199
+ // allows for better precision allocation when normalized
200
+ half clipped = min(derivative_t, DERIV_MAX);
201
+
202
+ // Discard values less than `DERIV_MIN` to reduce ghosting
203
+ clipped *= step(DERIV_MIN, derivative_t);
204
+
205
+ // Normalize with soft-clip
206
+ // x^1.5 = x * sqrt(x) | NOTE: only works because `DERIV_POW=1.5`
207
+ half curved = clipped * sqrt(clipped) * DERIV_MAX_POW_R;
208
+
209
+ //--------------------------------------------------------------------
210
+ // 4. Temporal accumulation
211
+ //--------------------------------------------------------------------
212
+ // Accumulate the new derivative into the history.
213
+ // We apply an adaptive alpha scaling, to ensure that if a derivative converges to a high value
214
+ // it becomes more difficult to reset that value, this provides temporally stable convergence
215
+ half alpha_scale = mix(DERIV_ALPHA,
216
+ DERIV_ALPHA * 0.1HF,
217
+ clamp(derivative_tm1, 0.HF, DERIV_MAX) * DERIV_MAX_R);
218
+
219
+ half derivative = mix(derivative_tm1, curved, alpha_scale);
220
+
221
+ //--------------------------------------------------------------------
222
+ // 5. Remove disoccluded pixels
223
+ //--------------------------------------------------------------------
224
+ derivative *= step(disocclusion_mask, DIS_THRESH);
225
+
226
+ // .x -> derivative for current frame, .y -> luma of current frame
227
+ return half2(derivative, luma_t);
228
+ }
229
+
230
+
231
+ void FindNearestDepth(int32_t2 iPxPos, int32_t2 iPxSize, out float fNearestDepth, out int32_t2 fNearestDepthOffset)
232
+ {
233
+ /*
234
+ Closely based on:
235
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/38697a58a6e7818ec9d28774bc073f537abb9178/
236
+ include/gpu/fsr2/ffxm_fsr2_reconstruct_dilated_velocity_and_previous_depth.h#L59
237
+ */
238
+
239
+ int32_t iSampleIndex = 0;
240
+ const int32_t iSampleCount = 9;
241
+ // x, y
242
+ const int32_t2 iSampleOffsets[iSampleCount] = {
243
+ int32_t2(+0, +0).yx,
244
+ int32_t2(+1, +0).yx,
245
+ int32_t2(+0, +1).yx,
246
+ int32_t2(+0, -1).yx,
247
+ int32_t2(-1, +0).yx,
248
+ int32_t2(-1, +1).yx,
249
+ int32_t2(+1, +1).yx,
250
+ int32_t2(-1, -1).yx,
251
+ int32_t2(+1, -1).yx,
252
+ };
253
+
254
+ // pull out the depth loads to allow SC to batch them
255
+ float depth[9];
256
+ depth[0] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, +0).yx).r);
257
+ depth[1] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, +0).yx).r);
258
+ depth[2] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, +1).yx).r);
259
+ depth[3] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, -1).yx).r);
260
+ depth[4] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, +0).yx).r);
261
+ depth[5] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, +1).yx).r);
262
+ depth[6] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, +1).yx).r);
263
+ depth[7] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, -1).yx).r);
264
+ depth[8] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, -1).yx).r);
265
+
266
+ // find closest depth
267
+ fNearestDepth = depth[0];
268
+ fNearestDepthOffset = iSampleOffsets[0];
269
+ #pragma unroll
270
+ for (iSampleIndex = 1; iSampleIndex < iSampleCount; ++iSampleIndex) {
271
+
272
+ int32_t2 iPos = iPxPos + iSampleOffsets[iSampleIndex];
273
+ if (IsOnScreen(iPos, iPxSize)) {
274
+
275
+ float fNdDepth = depth[iSampleIndex];
276
+ #ifdef INVERTED_DEPTH
277
+ if (fNdDepth > fNearestDepth) {
278
+ #else
279
+ if (fNdDepth < fNearestDepth) {
280
+ #endif
281
+ fNearestDepth = fNdDepth;
282
+ fNearestDepthOffset = iSampleOffsets[iSampleIndex];
283
+ }
284
+ }
285
+ }
286
+ }
287
+
288
+
289
+ int32_t2 RenderSize()
290
+ {
291
+ return int32_t2(_InputDims);
292
+ }
293
+
294
+
295
+ float2 ComputeNdc(float2 fPxPos, int32_t2 iSize)
296
+ {
297
+ /*
298
+ Closely based on:
299
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
300
+ 38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L457
301
+ */
302
+
303
+ return fPxPos.yx / float2(iSize.yx) * float2(2.0f, -2.0f) + float2(-1.0f, 1.0f);
304
+ }
305
+
306
+
307
+ float GetViewSpaceDepth(float fDeviceDepth)
308
+ {
309
+ /*
310
+ Closely based on:
311
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
312
+ 38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L462
313
+
314
+ `fDeviceToViewDepth` / `_DeviceToViewDepth` details found in:
315
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
316
+ 0501f490bd9946a2e1806b5363d7ab8a9a6a5e0a/src/components/fsr2/ffxm_fsr2.cpp#L829
317
+ */
318
+
319
+ const float4 fDeviceToViewDepth = _DeviceToViewDepth;
320
+
321
+ return (fDeviceToViewDepth[1] / (fDeviceDepth - fDeviceToViewDepth[0]));
322
+ }
323
+
324
+
325
+ float3 GetViewSpacePosition(int32_t2 iViewportPos, int32_t2 iViewportSize, float fDeviceDepth)
326
+ {
327
+ /*
328
+ Closely based on:
329
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
330
+ 38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L475
331
+ */
332
+
333
+ const float4 fDeviceToViewDepth = _DeviceToViewDepth;
334
+
335
+ const float Z = GetViewSpaceDepth(fDeviceDepth);
336
+
337
+ const float2 fNdcPos = ComputeNdc(iViewportPos, iViewportSize);
338
+ const float X = fDeviceToViewDepth[2] * fNdcPos.x * Z;
339
+ const float Y = fDeviceToViewDepth[3] * fNdcPos.y * Z;
340
+
341
+ return float3(X, Y, Z);
342
+ }
343
+
344
+
345
+ struct BilinearSamplingData
346
+ {
347
+ int32_t2 iOffsets[4];
348
+ float fWeights[4];
349
+ int32_t2 iBasePos;
350
+ float2 fQuadCenterUv;
351
+ };
352
+
353
+
354
+ BilinearSamplingData GetBilinearSamplingData(float2 fUv, int32_t2 iSize)
355
+ {
356
+ /*
357
+ Closely based on:
358
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
359
+ 38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L548
360
+ */
361
+
362
+ BilinearSamplingData data;
363
+
364
+ float2 fPxSample = (fUv * iSize) - float2(0.5f, 0.5f);
365
+ data.iBasePos = int32_t2(floor(fPxSample));
366
+ data.fQuadCenterUv = (fPxSample + 0.5f) / float2(iSize);
367
+ float2 fPxFrac = fract(fPxSample);
368
+
369
+ data.iOffsets[0] = int32_t2(0, 0);
370
+ data.iOffsets[2] = int32_t2(1, 0);
371
+ data.iOffsets[1] = int32_t2(0, 1);
372
+ data.iOffsets[3] = int32_t2(1, 1);
373
+
374
+ data.fWeights[0] = (1.f - fPxFrac.x) * (1.f - fPxFrac.y);
375
+ data.fWeights[1] = (fPxFrac.x) * (1.f - fPxFrac.y);
376
+ data.fWeights[2] = (1.f - fPxFrac.x) * (fPxFrac.y);
377
+ data.fWeights[3] = (fPxFrac.x) * (fPxFrac.y);
378
+
379
+ return data;
380
+ }
381
+
382
+
383
+ float ComputeDepthClip(float2 fUvSample, float fCurrentDepthSample)
384
+ {
385
+ /*
386
+ Closely based on:
387
+ https://github.com/arm/accuracy-super-resolution-generic-library/blob/
388
+ 38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_depth_clip.h#L36
389
+ */
390
+
391
+ const float fReconstructedDepthBilinearWeightThreshold = 0.1f;
392
+ float fCurrentDepthViewSpace = GetViewSpaceDepth(fCurrentDepthSample);
393
+ BilinearSamplingData bilinearInfo = GetBilinearSamplingData(fUvSample, RenderSize());
394
+
395
+ float fDepth = 0.0f;
396
+ float fWeightSum = 0.0f;
397
+
398
+ float4 fPrevDepthSamples;
399
+ GatherReconstructedPreviousDepthRQuad(bilinearInfo.fQuadCenterUv, fPrevDepthSamples);
400
+
401
+
402
+
403
+ for (int32_t iSampleIndex = 0; iSampleIndex < 4; iSampleIndex++)
404
+ {
405
+ const int32_t2 iOffset = bilinearInfo.iOffsets[iSampleIndex];
406
+ const int32_t2 iSamplePos = bilinearInfo.iBasePos + iOffset;
407
+
408
+ const float fWeight = bilinearInfo.fWeights[iSampleIndex];
409
+ const bool onscreen = IsOnScreen(iSamplePos, RenderSize());
410
+ fWeightSum += onscreen ? 0.f : fWeight;
411
+ if (onscreen)
412
+ {
413
+ if (fWeight > fReconstructedDepthBilinearWeightThreshold)
414
+ {
415
+ const float fPrevDepthSample = fPrevDepthSamples[iSampleIndex];
416
+ const float fPrevNearestDepthViewSpace = GetViewSpaceDepth(fPrevDepthSample);
417
+ const float fDepthDiff = fCurrentDepthViewSpace - fPrevNearestDepthViewSpace;
418
+
419
+ if (fDepthDiff > 0.0f) {
420
+
421
+ #ifdef INVERTED_DEPTH
422
+ const float fPlaneDepth = min(fPrevDepthSample, fCurrentDepthSample);
423
+ #else
424
+ const float fPlaneDepth = max(fPrevDepthSample, fCurrentDepthSample);
425
+ #endif
426
+
427
+ const float3 fCenter = GetViewSpacePosition(int32_t2(RenderSize() * 0.5f), RenderSize(), fPlaneDepth);
428
+ const float3 fCorner = GetViewSpacePosition(int32_t2(0, 0), RenderSize(), fPlaneDepth);
429
+
430
+ const float fHalfViewportWidth = length(float2(RenderSize()));
431
+ const float fDepthThreshold = max(fCurrentDepthViewSpace, fPrevNearestDepthViewSpace);
432
+
433
+ const float Ksep = 1.37e-05f;
434
+ const float Kfov = length(fCorner) / length(fCenter);
435
+ const float fRequiredDepthSeparation = Ksep * Kfov * fHalfViewportWidth * fDepthThreshold;
436
+
437
+ const float fResolutionFactor = saturate(length(float2(RenderSize())) / length(float2(1920.0f, 1080.0f)));
438
+ const float fPower = lerp(1.0f, 3.0f, fResolutionFactor);
439
+ fDepth += pow(saturate(float(fRequiredDepthSeparation / fDepthDiff)), fPower) * fWeight;
440
+ fWeightSum += fWeight;
441
+ }
442
+ }
443
+ }
444
+ }
445
+
446
+ return (fWeightSum > 0) ? saturate(1.0f - fDepth / fWeightSum) : 0.0f;
447
+ }
448
+
449
+
450
+ void WriteLumaDerivative(int32_t2 pixel, half2 derivative)
451
+ {
452
+ imageStore(_PreProcessLumaDerivOut, pixel, half4(derivative, half2(0.f, 1.f)));
453
+ }
454
+
455
+
456
+ void WriteNearestDepthOffset(int32_t2 pixel, uint8_t offset)
457
+ {
458
+ half enc_norm = half(offset) / 255.HF;
459
+ imageStore(_NearestDepthCoordOut, pixel, half4(enc_norm, 0.HF, 0.HF, 1.HF));
460
+ }
461
+
462
+
463
+ void WriteToTensor(int32_t2 outputPixel, half3 input_colour, half3 history, half disocclusion_mask, half luma_derivative, half4 temporal_feedback)
464
+ {
465
+ TensorElement te;
466
+ te.wh_rgb_col_r = Quantize(half4(history.rgb, input_colour.r), _InputQuantParams);
467
+ te.col_gb_dm_fback_r = Quantize(half4(input_colour.gb, disocclusion_mask, temporal_feedback.r), _InputQuantParams);
468
+ te.fback_gba_ld = Quantize(half4(temporal_feedback.gba, luma_derivative), _InputQuantParams);
469
+
470
+ int8_t t0[12] =
471
+ {
472
+ te.wh_rgb_col_r.x,
473
+ te.wh_rgb_col_r.y,
474
+ te.wh_rgb_col_r.z,
475
+ te.wh_rgb_col_r.w,
476
+ te.col_gb_dm_fback_r.x,
477
+ te.col_gb_dm_fback_r.y,
478
+ te.col_gb_dm_fback_r.z,
479
+ te.col_gb_dm_fback_r.w,
480
+ te.fback_gba_ld.x,
481
+ te.fback_gba_ld.y,
482
+ te.fback_gba_ld.z,
483
+ te.fback_gba_ld.w
484
+ };
485
+ tensorWriteARM(_PreprocessTensor, uint[](0, outputPixel.y, outputPixel.x, 0), t0);
486
+ }
487
+
488
+
489
+ // entry-point
490
+ layout(local_size_x = 16, local_size_y = 16) in;
491
+ void main()
492
+ {
493
+ int32_t2 input_pixel = int32_t2(gl_GlobalInvocationID.xy);
494
+ if (any(greaterThanEqual(input_pixel, _InputDims))) return;
495
+
496
+ float2 uv = (float2(input_pixel) + 0.5f) * _InvInputDims;
497
+
498
+ //-------------------------------------------------------------------------
499
+ // 1) Dilate depth, find nearest pixel coordinate
500
+ //-------------------------------------------------------------------------
501
+ float depth_dilated = float(0.f);
502
+ int32_t2 nearest_pixel_offset = int32_t2(0);
503
+ FindNearestDepth(input_pixel, RenderSize(), depth_dilated, nearest_pixel_offset);
504
+
505
+ //-------------------------------------------------------------------------
506
+ // 2) Load motion vectors
507
+ //-------------------------------------------------------------------------
508
+ half2 motion = LoadMotion(input_pixel + nearest_pixel_offset);
509
+
510
+ // Suppress very small motion - no value in resampling here
511
+ half2 motion_pix = motion * half2(RenderSize());
512
+ motion *= half(dot(motion_pix, motion_pix) > _MotionWarpThresh);
513
+
514
+ // Calculate sample position(s) for everything in `tm1` frame
515
+ float2 reproj_uv = uv - float2(motion);
516
+ float2 unjitter_tm1_uv = reproj_uv - _JitterOffsetTm1Uv;
517
+
518
+ //-------------------------------------------------------------------------
519
+ // 3) Calculate depth-based disocclusion mask
520
+ //-------------------------------------------------------------------------
521
+ half disocclusion_mask = half(ComputeDepthClip(unjitter_tm1_uv, depth_dilated));
522
+
523
+ // Scale disocclusion mask on static frames to let network know this is happening under
524
+ // static conditions, reduces jitter differences across frames causing false flags
525
+ half dm_scale = dot(motion_pix, motion_pix) > _MotionDisThresh ? half(1.0f) : _DisocclusionScale;
526
+ disocclusion_mask = disocclusion_mask * dm_scale;
527
+
528
+ //-------------------------------------------------------------------------
529
+ // 4) Downsample + warp history buffer
530
+ //-------------------------------------------------------------------------
531
+ half3 warped_history = WarpHistory(reproj_uv);
532
+
533
+ //-------------------------------------------------------------------------
534
+ // 5) Read current low-res / jittered / aliased colour
535
+ //-------------------------------------------------------------------------
536
+ half3 jittered_colour = LoadColour(input_pixel);
537
+
538
+ //-------------------------------------------------------------------------
539
+ // 6) Calculate derivative of `luma`
540
+ // helps identifying high-frequency flicker due to jitter
541
+ //-------------------------------------------------------------------------
542
+ half2 luma_derivative = CalculateLumaDerivative(reproj_uv, jittered_colour, disocclusion_mask);
543
+
544
+ //-------------------------------------------------------------------------
545
+ // 7) Warp temporal feedback
546
+ //-------------------------------------------------------------------------
547
+ half4 temporal_feedback = WarpFeedback(reproj_uv);
548
+
549
+ //-------------------------------------------------------------------------
550
+ // 8) Convert dilated depth coord to a position offset
551
+ //-------------------------------------------------------------------------
552
+ uint8_t enc_depth_offset = EncodeNearestDepthCoord(nearest_pixel_offset);
553
+
554
+ //-------------------------------------------------------------------------
555
+ // 9) Write Outputs
556
+ //-------------------------------------------------------------------------
557
+ // Consumed by NE
558
+ WriteToTensor(
559
+ input_pixel,
560
+ jittered_colour, // 3ch
561
+ warped_history, // 3ch
562
+ disocclusion_mask, // 1ch
563
+ luma_derivative.x, // 1ch
564
+ temporal_feedback // 4ch
565
+ ); // total: 12ch
566
+
567
+ // Consumed by post process and frame t+1
568
+ WriteNearestDepthOffset(input_pixel, enc_depth_offset);
569
+
570
+ // Consumed at frame t+1
571
+ WriteLumaDerivative(input_pixel, luma_derivative);
572
+ }
scenario/0_pre_process.spv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b03bcb283b73870daa0a540cfb8f1e8ec9c4842b38a711f52d31517569e79b87
3
+ size 29476
scenario/0_pre_process_push_consts.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6319b912dd9ee3e1ce44794067ea57fd9eb01ff0e38b3f8a55ceea7be18e6412
3
+ size 256
scenario/1_nss.vgf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd2a1bd13f156fcfa7a0cf132220ca39c2c2498f4af2c7c7da10a42ef4e555a7
3
+ size 163860
scenario/2_post_process.comp ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ //
2
+ // -----------------------------------------------------------------------------
3
+ // The proprietary software and information contained in this file is
4
+ // confidential and may only be used by an authorized person under a valid
5
+ // licensing agreement from Arm Limited or its affiliates.
6
+ //
7
+ // Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
8
+ //
9
+ // This entire notice must be reproduced on all copies of this file and
10
+ // copies of this file may only be made by an authorized person under a valid
11
+ // licensing agreement from Arm Limited or its affiliates.
12
+ // -----------------------------------------------------------------------------
13
+ //
14
+ #version 460
15
+ #extension GL_EXT_shader_8bit_storage : require
16
+ #extension GL_EXT_shader_16bit_storage : require
17
+ #extension GL_EXT_shader_explicit_arithmetic_types : require
18
+ #extension GL_EXT_shader_explicit_arithmetic_types_int8 : require
19
+ #extension GL_EXT_shader_explicit_arithmetic_types_float16 : require
20
+ #extension GL_EXT_shader_explicit_arithmetic_types_float32 : require
21
+ #extension GL_GOOGLE_include_directive : enable
22
+
23
+ // defines
24
+ #define SCALE_1_0X 0
25
+ #define SCALE_1_3X 1
26
+ #define SCALE_1_5X 2
27
+ #define SCALE_2_0X 3
28
+
29
+ // settings
30
+ #define HISTORY_CATMULL
31
+ #define SCALE_MODE SCALE_2_0X
32
+
33
+ // includes
34
+ #include "typedefs.h"
35
+ #include "common.h"
36
+ #include "kernel_lut.h"
37
+
38
+ // inputs
39
+ layout (set=0, binding=0) uniform mediump sampler2D _ColourTex; // 540p | R11G11B10 32bpp
40
+ layout (set=0, binding=1) uniform mediump sampler2D _MotionVectorTex; // 540p | RG16_FLOAT 32bpp
41
+ layout (set=0, binding=2) uniform mediump sampler2D _HistoryTex; // 1080p | R11G11B10 32bpp
42
+ layout (set=0, binding=3) uniform lowp sampler2D _K0Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
43
+ layout (set=0, binding=4) uniform lowp sampler2D _K1Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
44
+ layout (set=0, binding=5) uniform lowp sampler2D _K2Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
45
+ layout (set=0, binding=6) uniform lowp sampler2D _K3Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
46
+ layout (set=0, binding=7) uniform lowp sampler2D _TemporalTensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
47
+ layout (set=0, binding=8) uniform lowp sampler2D _NearestDepthCoordTex; // 540p | R8_UNORM 8bpp
48
+
49
+ // outputs
50
+ layout (set=1, binding=0, r11f_g11f_b10f) uniform writeonly mediump image2D _UpsampledColourOut; // 1080p | R11G11B10 32bpp
51
+
52
+ // push-constants
53
+ layout(push_constant, std430) uniform PushConstants {
54
+ // ─────────────── 8-byte aligned ───────────────
55
+ layout(offset = 0) int32_t2 _OutputDims; // 8 B
56
+ layout(offset = 8) int32_t2 _InputDims; // 8 B
57
+ layout(offset = 16) float2 _InvOutputDims; // 8 B
58
+ layout(offset = 24) float2 _InvInputDims; // 8 B
59
+ layout(offset = 32) float2 _Scale; // 8 B
60
+ layout(offset = 40) float2 _InvScale; // 8 B
61
+
62
+ // ─────────────── 4-byte aligned ───────────────
63
+ layout(offset = 48) int16_t2 _IndexModulo; // 4 B
64
+ layout(offset = 52) half2 _QuantParams; // 4 B
65
+ layout(offset = 56) int16_t2 _LutOffset; // 4 B
66
+ layout(offset = 60) half2 _ExposurePair; // 4 B
67
+ layout(offset = 64) half2 _HistoryPad; // 4 B
68
+ layout(offset = 68) half2 _MotionThreshPad; // 4 B (.x = motion, .y = unused)
69
+ layout(offset = 72) int32_t _Padding0; // 4 B (explicit pad for alignment)
70
+ // Total: **76 bytes**
71
+ };
72
+
73
+ // Convenience mapping for accessing push constants
74
+ #define _Exposure _ExposurePair.x
75
+ #define _InvExposure _ExposurePair.y
76
+ #define _NotHistoryReset _HistoryPad.x
77
+ #define _MotionThresh _MotionThreshPad.x
78
+
79
+ // Quantization Parameters
80
+ // inside: `./parameters.json`
81
+ // these values are embdedded inside the TOSA file and learnt during QAT
82
+
83
+ #ifndef _K0QuantParams
84
+ // outputs - activation_post_process_45["SNORM"]
85
+ #define _K0QuantParams _QuantParams.xy
86
+ #endif
87
+ #ifndef _K1QuantParams
88
+ // outputs - activation_post_process_50["SNORM"]
89
+ #define _K1QuantParams _QuantParams.xy
90
+ #endif
91
+ #ifndef _K2QuantParams
92
+ // outputs - activation_post_process_55["SNORM"]
93
+ #define _K2QuantParams _QuantParams.xy
94
+ #endif
95
+ #ifndef _K3QuantParams
96
+ // outputs - activation_post_process_60["SNORM"]
97
+ #define _K3QuantParams _QuantParams.xy
98
+ #endif
99
+ #ifndef _TemporalQuantParams
100
+ // outputs - activation_post_process_65["SNORM"]
101
+ #define _TemporalQuantParams _QuantParams.xy
102
+ #endif
103
+
104
+
105
+ // methods
106
+
107
+ half2 LoadMotion(int32_t2 pixel)
108
+ {
109
+ return half2(texelFetch(_MotionVectorTex, pixel, 0).rg);
110
+ }
111
+
112
+
113
+ half3 LoadHistory(float2 uv)
114
+ {
115
+ return half3(textureLod(_HistoryTex, uv, 0).rgb);
116
+ }
117
+
118
+ half3 LoadHistoryCatmull(float2 uv)
119
+ {
120
+ //------------------------------------------------------------------------------------
121
+ // 1) Compute Catmull–Rom weights
122
+ //------------------------------------------------------------------------------------
123
+ float2 scaledUV = uv * _OutputDims;
124
+ float2 baseFloor = floor(scaledUV - 0.5) + 0.5;
125
+
126
+ half2 f = half2(scaledUV - baseFloor);
127
+ half2 f2 = f * f;
128
+ half2 f3 = f2 * f;
129
+
130
+ // Catmull–Rom basis
131
+ half2 w0 = f2 - 0.5HF * (f3 + f);
132
+ half2 w1 = 1.5HF * f3 - 2.5HF * f2 + 1.0HF;
133
+ half2 w3 = 0.5HF * (f3 - f2);
134
+ half2 w2 = (1.0HF - w0) - w1 - w3; // = 1 - (w0 + w1 + w3)
135
+
136
+ // Combine w1 and w2 for center axis
137
+ half2 w12 = w1 + w2;
138
+ half wx0 = w0.x, wy0 = w0.y;
139
+ half wx1 = w12.x, wy1 = w12.y;
140
+ half wx2 = w3.x, wy2 = w3.y;
141
+
142
+ // Final weights for the cross sample layout
143
+ half wUp = wx1 * wy0; // center in X, up in Y
144
+ half wDown = wx1 * wy2; // center in X, down in Y
145
+ half wLeft = wx0 * wy1; // left in X, center in Y
146
+ half wRight = wx2 * wy1; // right in X, center in Y
147
+ half wCenter = wx1 * wy1; // center in X, center in Y
148
+
149
+ // Fractional offsets for the center
150
+ half dx = w2.x / wx1;
151
+ half dy = w2.y / wy1;
152
+
153
+ //------------------------------------------------------------------------------------
154
+ // 2) Gather the 5 taps
155
+ //------------------------------------------------------------------------------------
156
+ half4 left = half4(LoadHistory((baseFloor + float2(-1.0, dy)) * _InvOutputDims ), 1.HF);
157
+ half4 up = half4(LoadHistory((baseFloor + float2(dx, -1.0)) * _InvOutputDims ), 1.HF);
158
+ half4 center = half4(LoadHistory((baseFloor + float2(dx, dy)) * _InvOutputDims ), 1.HF);
159
+ half4 right = half4(LoadHistory((baseFloor + float2(2.0, dy)) * _InvOutputDims ), 1.HF);
160
+ half4 down = half4(LoadHistory((baseFloor + float2(dx, 2.0)) * _InvOutputDims ), 1.HF);
161
+
162
+ //------------------------------------------------------------------------------------
163
+ // 3) Accumulate and track min/max
164
+ //------------------------------------------------------------------------------------
165
+ half4 accum = up * wUp +
166
+ left * wLeft +
167
+ center* wCenter +
168
+ right * wRight +
169
+ down * wDown;
170
+ half3 cmin3 = min(up.rgb,
171
+ min(left.rgb,
172
+ min(center.rgb,
173
+ min(right.rgb, down.rgb))));
174
+ half3 cmax3 = max(up.rgb,
175
+ max(left.rgb,
176
+ max(center.rgb,
177
+ max(right.rgb, down.rgb))));
178
+
179
+ //------------------------------------------------------------------------------------
180
+ // 4) Final color
181
+ //------------------------------------------------------------------------------------
182
+ half3 color = accum.rgb * rcp(accum.w);
183
+
184
+ // dering in the case where we have negative values, we don't do this all the time
185
+ // as it can impose unnecessary blurring on the output
186
+ return any(lessThan(color, half3(0.HF)))
187
+ ? clamp(color, cmin3, cmax3)
188
+ : color;
189
+ }
190
+
191
+
192
+ int32_t2 LoadNearestDepthOffset(int32_t2 pixel)
193
+ {
194
+ half encNorm = half(texelFetch(_NearestDepthCoordTex, pixel, 0).r);
195
+ int32_t code = int32_t(encNorm * 255.0 + 0.5);
196
+
197
+ // 3. map back to {-1,0,1}²
198
+ return DecodeNearestDepthCoord(code);
199
+ }
200
+
201
+
202
+ half3 LoadWarpedHistory(float2 uv, int32_t2 input_pixel, out half onscreen)
203
+ {
204
+ // Dilate motion vectors with previously calculated nearest depth coordinate
205
+ int32_t2 nearest_offset = LoadNearestDepthOffset(input_pixel);
206
+ half2 motion = LoadMotion(input_pixel + nearest_offset);
207
+
208
+ // Suppress very small motion - no need to resample
209
+ half2 motion_pix = motion * half2(_OutputDims);
210
+ motion *= half(dot(motion_pix, motion_pix) > _MotionThresh);
211
+
212
+ // UV coordinates in previous frame to resample history
213
+ float2 reproj_uv = uv - float2(motion);
214
+
215
+ // Mask to flag whether the motion vector is resampling from valid location onscreen
216
+ onscreen = half(
217
+ all(greaterThanEqual(reproj_uv, float2(0.0))) &&
218
+ all(lessThan(reproj_uv, float2(1.0)))
219
+ );
220
+
221
+ #ifdef HISTORY_CATMULL
222
+ half3 warped_history = LoadHistoryCatmull(reproj_uv);
223
+ #else
224
+ half3 warped_history = LoadHistory(reproj_uv);
225
+ #endif
226
+
227
+ return SafeColour(warped_history * _Exposure);
228
+ }
229
+
230
+ #if SCALE_MODE == SCALE_2_0X
231
+ /*
232
+ Optimised special case pattern for applying 4x4 kernel to
233
+ sparse jitter-aware 2x2 upsampled image
234
+ */
235
+
236
+
237
+ half4 LoadKPNWeight(float2 uv, int16_t lut_idx)
238
+ {
239
+ // Load 4 kernel slices (each with 4 taps)
240
+ half4 k0 = Dequantize(half4(textureLod(_K0Tensor, uv, 0)), _K0QuantParams);
241
+ half4 k1 = Dequantize(half4(textureLod(_K1Tensor, uv, 0)), _K1QuantParams);
242
+ half4 k2 = Dequantize(half4(textureLod(_K2Tensor, uv, 0)), _K2QuantParams);
243
+ half4 k3 = Dequantize(half4(textureLod(_K3Tensor, uv, 0)), _K3QuantParams);
244
+
245
+ // Precomputed swizzle patterns for KernelTile
246
+ half4 p0 = half4(k0.x, k2.x, k0.z, k2.z);
247
+ half4 p1 = half4(k1.x, k3.x, k1.z, k3.z);
248
+ half4 p2 = half4(k0.y, k2.y, k0.w, k2.w);
249
+ half4 p3 = half4(k1.y, k3.y, k1.w, k3.w);
250
+
251
+ // Return the correct pattern for this tile
252
+ return (lut_idx == 0) ? p0 :
253
+ (lut_idx == 1) ? p1 :
254
+ (lut_idx == 2) ? p2 :
255
+ p3;
256
+ }
257
+
258
+
259
+ half3 LoadAndFilterColour(int32_t2 output_pixel, float2 uv, out half4 col_to_accum)
260
+ {
261
+ //-------------------------------------------------------------------
262
+ // 1. Compute indexes, load correct pattern from LUT for given thread
263
+ //-------------------------------------------------------------------
264
+ float2 out_tex = float2(output_pixel) + 0.5f;
265
+
266
+ // Compute the LUT index for this pixel
267
+ int16_t2 tiled_idx = (int16_t2(output_pixel) + _LutOffset) % int16_t2(_IndexModulo);
268
+ int16_t lut_idx = tiled_idx.y * int16_t(_IndexModulo) + tiled_idx.x;
269
+ KernelTile lut = kernelLUT[lut_idx];
270
+
271
+ //------------------------------------------------------------------
272
+ // 2. Apply KPN
273
+ //------------------------------------------------------------------
274
+ // Dequantize the kernel weights
275
+ half4 kpn_weights = clamp(LoadKPNWeight(uv, lut_idx), half4(EPS), half4(1.HF));
276
+
277
+ // Calculate tap locations
278
+ int16_t4 tap_x = clamp(int16_t4(floor((float4(out_tex.x) + float4(lut.dx)) * _InvScale.x)), int16_t4(0), int16_t4(_InputDims.x - 1));
279
+ int16_t4 tap_y = clamp(int16_t4(floor((float4(out_tex.y) + float4(lut.dy)) * _InvScale.y)), int16_t4(0), int16_t4(_InputDims.y - 1));
280
+
281
+ // Gather taps
282
+ f16mat4x4 interm;
283
+ interm[0] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[0], tap_y[0]), 0).rgb) * half3(_Exposure)), 1.HF);
284
+ interm[1] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[1], tap_y[1]), 0).rgb) * half3(_Exposure)), 1.HF);
285
+ interm[2] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[2], tap_y[2]), 0).rgb) * half3(_Exposure)), 1.HF);
286
+ interm[3] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[3], tap_y[3]), 0).rgb) * half3(_Exposure)), 1.HF);
287
+
288
+ // Special case: grab the accumulation pixel, when it corresponds to current thread
289
+ half match = half(lut.dx[CENTER_TAP] == 0 && lut.dy[CENTER_TAP] == 0);
290
+ col_to_accum = interm[CENTER_TAP] * match;
291
+
292
+ // Apply filter
293
+ half4 out_colour = interm * kpn_weights;
294
+
295
+ return half3(out_colour.rgb * rcp(out_colour.w));
296
+ }
297
+ #else
298
+ #error "Unsupported SCALE_MODE"
299
+ #endif // SCALE_MODE == SCALE_2_0X
300
+
301
+
302
+ void LoadTemporalParameters(float2 uv, out half theta, out half alpha)
303
+ {
304
+ half2 tp = Dequantize(half2(textureLod(_TemporalTensor, uv, 0).xy), _TemporalQuantParams);
305
+ theta = tp.x * _NotHistoryReset; // {0 <= x <= 1}
306
+ alpha = tp.y * 0.35HF + 0.05HF; // { 0.05 <= x <= 0.4}
307
+ }
308
+
309
+
310
+ void WriteUpsampledColour(int32_t2 pixel, half3 colour)
311
+ {
312
+ half3 to_write = SafeColour(colour);
313
+ // Write with alpha = 1.0
314
+ imageStore(_UpsampledColourOut, pixel, half4(to_write, 1.0));
315
+ }
316
+
317
+
318
+ // entry-point
319
+ layout(local_size_x = 16, local_size_y = 16) in;
320
+ void main()
321
+ {
322
+ int32_t2 output_pixel = int32_t2(gl_GlobalInvocationID.xy);
323
+ if (any(greaterThanEqual(output_pixel, _OutputDims))) return;
324
+
325
+ float2 uv = (float2(output_pixel) + 0.5) * _InvOutputDims;
326
+ int32_t2 input_pixel = int32_t2(uv * _InputDims);
327
+
328
+ //-------------------------------------------------------------------------
329
+ // 1) Warp history
330
+ //-------------------------------------------------------------------------
331
+ half onscreen;
332
+ half3 history = LoadWarpedHistory(uv, input_pixel, onscreen);
333
+
334
+ //-------------------------------------------------------------------------
335
+ // 2) KPN filter → col
336
+ //-------------------------------------------------------------------------
337
+ half4 col_to_accum;
338
+ half3 colour = LoadAndFilterColour(output_pixel, uv, col_to_accum);
339
+
340
+ // -------------------------------------------------------------------------
341
+ // 3) Load temporal parameters
342
+ //-------------------------------------------------------------------------
343
+ half theta, alpha;
344
+ LoadTemporalParameters(uv, theta, alpha);
345
+
346
+ //-------------------------------------------------------------------------
347
+ // 3) Rectify history, force reset when offscreen
348
+ //-------------------------------------------------------------------------
349
+ half3 rectified = lerp(colour, history, theta * onscreen);
350
+
351
+ //-------------------------------------------------------------------------
352
+ // 3) Accumulate new sample
353
+ //-------------------------------------------------------------------------
354
+ half3 accumulated = lerp(Tonemap(rectified), Tonemap(col_to_accum.rgb), alpha * col_to_accum.a);
355
+
356
+ //-------------------------------------------------------------------------
357
+ // 4) Inverse tonemap + exposure and write output
358
+ //-------------------------------------------------------------------------
359
+ half3 out_linear = InverseTonemap(accumulated) * _InvExposure;
360
+ WriteUpsampledColour(output_pixel, out_linear);
361
+ }
scenario/2_post_process.spv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d15c811db716f90606bb42710f5093bfd3dcfedb674ab7223b27909d8c3467a5
3
+ size 25780
scenario/2_post_process_push_consts.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fe5029783bc6bb2adaa1f8bafc9ab8fe73340bb0a3055d18f80ca3e6a99862a
3
+ size 204
scenario/common.h ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ //
2
+ // -----------------------------------------------------------------------------
3
+ // The proprietary software and information contained in this file is
4
+ // confidential and may only be used by an authorized person under a valid
5
+ // licensing agreement from Arm Limited or its affiliates.
6
+ //
7
+ // Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
8
+ //
9
+ // This entire notice must be reproduced on all copies of this file and
10
+ // copies of this file may only be made by an authorized person under a valid
11
+ // licensing agreement from Arm Limited or its affiliates.
12
+ // -----------------------------------------------------------------------------
13
+ //
14
+ #ifndef NSS_COMMON
15
+ #define NSS_COMMON
16
+
17
+ #include "typedefs.h"
18
+
19
+ #define MAX_FP16 65504.HF
20
+ #define EPS 1e-7HF
21
+
22
+
23
+ // Activation Functions
24
+ // ──────────────────────────────────────────────────────────────────────────────────────────
25
+
26
+
27
+ half Sigmoid(half x)
28
+ {
29
+ return rcp(half(1.0) + exp(-x));
30
+ }
31
+
32
+
33
+ half2 Sigmoid(half2 x)
34
+ {
35
+ return rcp(half2(1.0) + exp(-x));
36
+ }
37
+
38
+
39
+ half3 Sigmoid(half3 x)
40
+ {
41
+ return rcp(half3(1.0) + exp(-x));
42
+ }
43
+
44
+
45
+ half4 Sigmoid(half4 x)
46
+ {
47
+ return rcp(half4(1.0) + exp(-x));
48
+ }
49
+
50
+
51
+ // Quantize/Dequantize
52
+ // ──────────────────────────────────────────────────────────────────────────────────────────
53
+ // all expect .x = scale, .y = zero point, quantize methods expect to receive: .x = rcp(scale)
54
+
55
+ half Dequantize(half i, half2 quant_params)
56
+ {
57
+ return (i - quant_params.y) * quant_params.x;
58
+ }
59
+
60
+
61
+ half2 Dequantize(half2 i, half2 quant_params)
62
+ {
63
+ return (i - quant_params.y) * quant_params.x;
64
+ }
65
+
66
+
67
+ half3 Dequantize(half3 i, half2 quant_params)
68
+ {
69
+ return (i - quant_params.y) * quant_params.x;
70
+ }
71
+
72
+
73
+ half4 Dequantize(half4 i, half2 quant_params)
74
+ {
75
+ return (i - quant_params.y) * quant_params.x;
76
+ }
77
+
78
+
79
+ int8_t Quantize(half f, half2 quant_params)
80
+ {
81
+ return int8_t(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
82
+ }
83
+
84
+
85
+ int8_t2 Quantize(half2 f, half2 quant_params)
86
+ {
87
+ return int8_t2(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
88
+ }
89
+
90
+
91
+ int8_t3 Quantize(half3 f, half2 quant_params)
92
+ {
93
+ return int8_t3(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
94
+ }
95
+
96
+
97
+ int8_t4 Quantize(half4 f, half2 quant_params)
98
+ {
99
+ return int8_t4(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
100
+ }
101
+
102
+
103
+ // Encode/Decode
104
+ // ──────────────────────────────────────────────────────────────────────────────────────────
105
+ // Note: both encode/decode methods are currently bound to 3x3 windows, they should be
106
+ // expandable in future if needed. The most likely to need this would be the jitter
107
+ // encoding, where 3x3 may not be enough for larger than 3x3 scale factors.
108
+
109
+
110
+ uint8_t EncodeNearestDepthCoord(int32_t2 o)
111
+ {
112
+ // o ∈ {-1, 0, 1}²
113
+ o = clamp(o, ivec2(-1), ivec2( 1));
114
+ return uint8_t((o.y + 1) << 2 | (o.x + 1)); // 0-15
115
+ }
116
+
117
+
118
+ int32_t2 DecodeNearestDepthCoord(int32_t code)
119
+ {
120
+ int32_t x = int32_t( code & 0x3) - 1; // bits 0-1
121
+ int32_t y = int32_t((code >> 2) & 0x3) - 1; // bits 2-3
122
+ return int32_t2(x, y);
123
+ }
124
+
125
+
126
+ // Image Operations
127
+ // ──────────────────────────────────────────────────────────────────────────────────────────
128
+
129
+ half Luminance(half3 rgb)
130
+ {
131
+ // ITU-R BT.709: `0.2126 * R + 0.7152 * G + 0.0722 * B`
132
+ return dot(rgb, half3(0.2126, 0.7152, 0.0722));
133
+ }
134
+
135
+
136
+ half3 Tonemap(half3 x)
137
+ {
138
+ // Karis tonemapper
139
+ // http://graphicrants.blogspot.com/2013/12/tone-mapping.html
140
+ x = max(x, half3(0.HF));
141
+ return x * rcp(half3(1.HF) + max(max(x.r, x.g), x.b));
142
+ }
143
+
144
+
145
+ half3 InverseTonemap(half3 x)
146
+ {
147
+ // Karis tonemapper inverse
148
+ // http://graphicrants.blogspot.com/2013/12/tone-mapping.html
149
+ x = clamp(x, half3(0.HF), Tonemap(half3(MAX_FP16)));
150
+ return x * rcp(half3(1.HF) - max(max(x.r, x.g), x.b));
151
+ }
152
+
153
+
154
+ half3 SafeColour(half3 x)
155
+ {
156
+ return clamp(x, half3(0.HF), half3(MAX_FP16));
157
+ }
158
+
159
+
160
+ #endif // NSS_COMMON
scenario/in_colour.dds ADDED

Git LFS Details

  • SHA256: 06ee236dd66b3a6843af2a0617a8186bc78dc0310cff4c3a21b44803ce742ecb
  • Pointer size: 132 Bytes
  • Size of remote file: 2.09 MB
scenario/in_depth.dds ADDED

Git LFS Details

  • SHA256: d68696b2e29f63999a65f9125b9788a7a68a89adfe105efe5b817dc71ab6137a
  • Pointer size: 132 Bytes
  • Size of remote file: 2.09 MB
scenario/in_depth_tm1.dds ADDED

Git LFS Details

  • SHA256: 49bc41de4eaee7b2fe0f486419bb5bcd65cca46b76c86003f9bcaf3f3e3fe6e4
  • Pointer size: 132 Bytes
  • Size of remote file: 2.09 MB
scenario/in_derivative_tm1.dds ADDED

Git LFS Details

  • SHA256: 993558f23469464d8cd517d695c6e85693c2580466d1c1d1779dddaecbc450e4
  • Pointer size: 132 Bytes
  • Size of remote file: 1.04 MB
scenario/in_feedback_tm1.dds ADDED

Git LFS Details

  • SHA256: a37afdcdbb350ebbe0b5c3a1fa610c84aafed2d31195e77e8cc77598625239d3
  • Pointer size: 132 Bytes
  • Size of remote file: 2.09 MB
scenario/in_history.dds ADDED

Git LFS Details

  • SHA256: cf6e0a5805a7abeb497eced1c22adff5ecb5e249ac66f6419fcf9e0b64418197
  • Pointer size: 132 Bytes
  • Size of remote file: 8.36 MB
scenario/in_motion.dds ADDED

Git LFS Details

  • SHA256: 7653e886a7a12b7c9ced950cba304144433bf755fa5010bb53a010ee56ae78ab
  • Pointer size: 132 Bytes
  • Size of remote file: 2.09 MB
scenario/in_nearest_offset_tm1.dds ADDED

Git LFS Details

  • SHA256: f94dccb9315cee17e25e98d822eb3234e44308c7838e8a6aafe57e38194e57e7
  • Pointer size: 131 Bytes
  • Size of remote file: 522 kB
scenario/kernel_lut.h ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ //
2
+ // -----------------------------------------------------------------------------
3
+ // The proprietary software and information contained in this file is
4
+ // confidential and may only be used by an authorized person under a valid
5
+ // licensing agreement from Arm Limited or its affiliates.
6
+ //
7
+ // Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
8
+ //
9
+ // This entire notice must be reproduced on all copies of this file and
10
+ // copies of this file may only be made by an authorized person under a valid
11
+ // licensing agreement from Arm Limited or its affiliates.
12
+ // -----------------------------------------------------------------------------
13
+ //
14
+ #ifndef NSS_KERNEL_LUT
15
+ #define NSS_KERNEL_LUT
16
+ #include "typedefs.h"
17
+
18
+
19
+ struct KernelTile {
20
+ int16_t4 dy;
21
+ int16_t4 dx;
22
+ };
23
+
24
+
25
+ // Define actual scale value based on mode
26
+ #if SCALE_MODE == SCALE_2_0X
27
+
28
+ #define CENTER_TAP 0
29
+ #define NUM_PATTERNS 4
30
+
31
+ const KernelTile kernelLUT[NUM_PATTERNS] = {
32
+ {
33
+ // Pattern 0:
34
+ // Taps: 0, 2, 8, 10
35
+ // Grid:
36
+ // [● · ● ·]
37
+ // [· · · ·]
38
+ // [● · ● ·]
39
+ // [· · · ·]
40
+ int16_t4(-1, -1, +1, +1),
41
+ int16_t4(-1, +1, -1, +1)
42
+ },
43
+ {
44
+ // Pattern 1:
45
+ // Taps: 4, 6, 12, 14
46
+ // Grid:
47
+ // [· · · ·]
48
+ // [● · ● ·]
49
+ // [· · · ·]
50
+ // [● · ● ·]
51
+ int16_t4(-1, -1, +1, +1),
52
+ int16_t4(+0, +2, +0, +2)
53
+ },
54
+ {
55
+ // Pattern 2:
56
+ // Taps: 1, 3, 9, 11
57
+ // Grid:
58
+ // [· ● · ●]
59
+ // [· · · ·]
60
+ // [· ● · ●]
61
+ // [· · · ·]
62
+ int16_t4(+0, +0, +2, +2),
63
+ int16_t4(-1, +1, -1, +1)
64
+ },
65
+ {
66
+ // Pattern 3:
67
+ // Taps: 5, 7, 13, 15
68
+ // Grid:
69
+ // [· · · ·]
70
+ // [· ● · ●]
71
+ // [· · · ·]
72
+ // [· ● · ●]
73
+ int16_t4( 0, +0, +2, +2), // center-aligned
74
+ int16_t4( 0, +2, +0, +2)
75
+ }
76
+ };
77
+
78
+ #else
79
+ #error "Unsupported SCALE_MODE"
80
+ #endif
81
+
82
+
83
+ #endif //NSS_KERNEL_LUT
scenario/parameters.json ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "inputs": {
3
+ "x": {
4
+ "SINT": {
5
+ "scale": 0.003921568859368563,
6
+ "zero_point": -128
7
+ },
8
+ "SNORM": {
9
+ "scale": 0.49803924513980746,
10
+ "zero_point": -1.0078740157480315
11
+ }
12
+ }
13
+ },
14
+ "outputs": {
15
+ "activation_post_process_45": {
16
+ "SINT": {
17
+ "scale": 0.003937007859349251,
18
+ "zero_point": -127
19
+ },
20
+ "SNORM": {
21
+ "scale": 0.49999999813735485,
22
+ "zero_point": -1.0
23
+ }
24
+ },
25
+ "activation_post_process_50": {
26
+ "SINT": {
27
+ "scale": 0.003937007859349251,
28
+ "zero_point": -127
29
+ },
30
+ "SNORM": {
31
+ "scale": 0.49999999813735485,
32
+ "zero_point": -1.0
33
+ }
34
+ },
35
+ "activation_post_process_55": {
36
+ "SINT": {
37
+ "scale": 0.003937007859349251,
38
+ "zero_point": -127
39
+ },
40
+ "SNORM": {
41
+ "scale": 0.49999999813735485,
42
+ "zero_point": -1.0
43
+ }
44
+ },
45
+ "activation_post_process_60": {
46
+ "SINT": {
47
+ "scale": 0.003937007859349251,
48
+ "zero_point": -127
49
+ },
50
+ "SNORM": {
51
+ "scale": 0.49999999813735485,
52
+ "zero_point": -1.0
53
+ }
54
+ },
55
+ "activation_post_process_65": {
56
+ "SINT": {
57
+ "scale": 0.003937007859349251,
58
+ "zero_point": -127
59
+ },
60
+ "SNORM": {
61
+ "scale": 0.49999999813735485,
62
+ "zero_point": -1.0
63
+ }
64
+ },
65
+ "activation_post_process_70": {
66
+ "SINT": {
67
+ "scale": 0.003937007859349251,
68
+ "zero_point": -127
69
+ },
70
+ "SNORM": {
71
+ "scale": 0.49999999813735485,
72
+ "zero_point": -1.0
73
+ }
74
+ }
75
+ },
76
+ "learnt_constants": {
77
+ "dm_scale": 0.617464542388916
78
+ }
79
+ }
scenario/scenario.json ADDED
@@ -0,0 +1,821 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "commands": [
3
+ {
4
+ "mark_boundary": {
5
+ "frame_id": "0",
6
+ "resources": []
7
+ }
8
+ },
9
+ {
10
+ "dispatch_compute": {
11
+ "shader_ref": "0_pre_process",
12
+ "push_data_ref": "push_data_1",
13
+ "rangeND": [
14
+ 60,
15
+ 34,
16
+ 1
17
+ ],
18
+ "implicit_barrier": false,
19
+ "bindings": [
20
+ {
21
+ "set": 0,
22
+ "id": 2,
23
+ "resource_ref": "in_motion"
24
+ },
25
+ {
26
+ "set": 0,
27
+ "id": 0,
28
+ "resource_ref": "in_colour"
29
+ },
30
+ {
31
+ "set": 0,
32
+ "id": 7,
33
+ "resource_ref": "in_nearest_offset_tm1"
34
+ },
35
+ {
36
+ "set": 0,
37
+ "id": 5,
38
+ "resource_ref": "in_depth_tm1"
39
+ },
40
+ {
41
+ "set": 0,
42
+ "id": 3,
43
+ "resource_ref": "in_history"
44
+ },
45
+ {
46
+ "set": 0,
47
+ "id": 4,
48
+ "resource_ref": "in_feedback_tm1"
49
+ },
50
+ {
51
+ "set": 0,
52
+ "id": 6,
53
+ "resource_ref": "in_derivative_tm1"
54
+ },
55
+ {
56
+ "set": 0,
57
+ "id": 1,
58
+ "resource_ref": "in_depth"
59
+ },
60
+ {
61
+ "set": 1,
62
+ "id": 1,
63
+ "resource_ref": "out_derivative",
64
+ "descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
65
+ },
66
+ {
67
+ "set": 1,
68
+ "id": 3,
69
+ "resource_ref": "out_nearest_offset",
70
+ "descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
71
+ },
72
+ {
73
+ "set": 1,
74
+ "id": 0,
75
+ "resource_ref": "out_input_tensor"
76
+ }
77
+ ]
78
+ }
79
+ },
80
+ {
81
+ "dispatch_barrier": {
82
+ "image_barrier_refs": [],
83
+ "tensor_barrier_refs": [
84
+ "barrier_14"
85
+ ],
86
+ "memory_barrier_refs": [],
87
+ "buffer_barrier_refs": []
88
+ }
89
+ },
90
+ {
91
+ "dispatch_graph": {
92
+ "graph_ref": "1_nss",
93
+ "implicit_barrier": false,
94
+ "bindings": [
95
+ {
96
+ "set": 0,
97
+ "id": 0,
98
+ "resource_ref": "out_input_tensor"
99
+ },
100
+ {
101
+ "set": 0,
102
+ "id": 1,
103
+ "resource_ref": "out_feedback"
104
+ },
105
+ {
106
+ "set": 0,
107
+ "id": 2,
108
+ "resource_ref": "out_tp_aliaser"
109
+ },
110
+ {
111
+ "set": 0,
112
+ "id": 3,
113
+ "resource_ref": "out_k3_aliaser"
114
+ },
115
+ {
116
+ "set": 0,
117
+ "id": 4,
118
+ "resource_ref": "out_k2_aliaser"
119
+ },
120
+ {
121
+ "set": 0,
122
+ "id": 5,
123
+ "resource_ref": "out_k1_aliaser"
124
+ },
125
+ {
126
+ "set": 0,
127
+ "id": 6,
128
+ "resource_ref": "out_k0_aliaser"
129
+ }
130
+ ]
131
+ }
132
+ },
133
+ {
134
+ "dispatch_barrier": {
135
+ "image_barrier_refs": [
136
+ "barrier_23",
137
+ "barrier_25",
138
+ "barrier_27",
139
+ "barrier_29",
140
+ "barrier_31",
141
+ "barrier_33"
142
+ ],
143
+ "tensor_barrier_refs": [],
144
+ "memory_barrier_refs": [],
145
+ "buffer_barrier_refs": []
146
+ }
147
+ },
148
+ {
149
+ "dispatch_compute": {
150
+ "shader_ref": "2_post_process",
151
+ "push_data_ref": "push_data_22",
152
+ "rangeND": [
153
+ 120,
154
+ 68,
155
+ 1
156
+ ],
157
+ "implicit_barrier": false,
158
+ "bindings": [
159
+ {
160
+ "set": 0,
161
+ "id": 1,
162
+ "resource_ref": "in_motion"
163
+ },
164
+ {
165
+ "set": 0,
166
+ "id": 2,
167
+ "resource_ref": "in_history"
168
+ },
169
+ {
170
+ "set": 0,
171
+ "id": 8,
172
+ "resource_ref": "out_nearest_offset"
173
+ },
174
+ {
175
+ "set": 0,
176
+ "id": 3,
177
+ "resource_ref": "out_k0"
178
+ },
179
+ {
180
+ "set": 0,
181
+ "id": 4,
182
+ "resource_ref": "out_k1"
183
+ },
184
+ {
185
+ "set": 0,
186
+ "id": 5,
187
+ "resource_ref": "out_k2"
188
+ },
189
+ {
190
+ "set": 0,
191
+ "id": 6,
192
+ "resource_ref": "out_k3"
193
+ },
194
+ {
195
+ "set": 0,
196
+ "id": 0,
197
+ "resource_ref": "in_colour"
198
+ },
199
+ {
200
+ "set": 0,
201
+ "id": 7,
202
+ "resource_ref": "out_tp"
203
+ },
204
+ {
205
+ "set": 1,
206
+ "id": 0,
207
+ "resource_ref": "out_colour",
208
+ "descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
209
+ }
210
+ ]
211
+ }
212
+ },
213
+ {
214
+ "mark_boundary": {
215
+ "frame_id": "1",
216
+ "resources": [
217
+ "out_colour"
218
+ ]
219
+ }
220
+ }
221
+ ],
222
+ "resources": [
223
+ {
224
+ "shader": {
225
+ "uid": "0_pre_process",
226
+ "src": "./0_pre_process.spv",
227
+ "entry": "main",
228
+ "type": "SPIR-V",
229
+ "push_constants_size": 128,
230
+ "specialization_constants": []
231
+ }
232
+ },
233
+ {
234
+ "raw_data": {
235
+ "uid": "push_data_1",
236
+ "src": "./0_pre_process_push_consts.npy"
237
+ }
238
+ },
239
+ {
240
+ "image": {
241
+ "uid": "in_motion",
242
+ "dims": [
243
+ 1,
244
+ 960,
245
+ 544,
246
+ 1
247
+ ],
248
+ "src": "./in_motion.dds",
249
+ "format": "VK_FORMAT_R16G16_SFLOAT",
250
+ "shader_access": "readonly",
251
+ "mips": 1,
252
+ "min_filter": "LINEAR",
253
+ "mag_filter": "LINEAR",
254
+ "mip_filter": "NEAREST",
255
+ "border_address_mode": "CLAMP_BORDER",
256
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
257
+ "tiling": "OPTIMAL"
258
+ }
259
+ },
260
+ {
261
+ "image": {
262
+ "uid": "in_colour",
263
+ "dims": [
264
+ 1,
265
+ 960,
266
+ 544,
267
+ 1
268
+ ],
269
+ "src": "./in_colour.dds",
270
+ "format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
271
+ "shader_access": "readonly",
272
+ "mips": 1,
273
+ "min_filter": "LINEAR",
274
+ "mag_filter": "LINEAR",
275
+ "mip_filter": "NEAREST",
276
+ "border_address_mode": "CLAMP_BORDER",
277
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
278
+ "tiling": "OPTIMAL"
279
+ }
280
+ },
281
+ {
282
+ "image": {
283
+ "uid": "in_nearest_offset_tm1",
284
+ "dims": [
285
+ 1,
286
+ 960,
287
+ 544,
288
+ 1
289
+ ],
290
+ "src": "./in_nearest_offset_tm1.dds",
291
+ "format": "VK_FORMAT_R8_UNORM",
292
+ "shader_access": "readonly",
293
+ "mips": 1,
294
+ "min_filter": "LINEAR",
295
+ "mag_filter": "LINEAR",
296
+ "mip_filter": "NEAREST",
297
+ "border_address_mode": "CLAMP_BORDER",
298
+ "border_color": "FLOAT_CUSTOM_EXT",
299
+ "custom_border_color": [
300
+ 0.0,
301
+ 0.0,
302
+ 0.0,
303
+ 0.0
304
+ ],
305
+ "tiling": "OPTIMAL"
306
+ }
307
+ },
308
+ {
309
+ "image": {
310
+ "uid": "in_depth_tm1",
311
+ "dims": [
312
+ 1,
313
+ 960,
314
+ 544,
315
+ 1
316
+ ],
317
+ "src": "./in_depth_tm1.dds",
318
+ "format": "VK_FORMAT_R32_SFLOAT",
319
+ "shader_access": "readonly",
320
+ "mips": 1,
321
+ "min_filter": "LINEAR",
322
+ "mag_filter": "LINEAR",
323
+ "mip_filter": "NEAREST",
324
+ "border_address_mode": "CLAMP_BORDER",
325
+ "border_color": "FLOAT_CUSTOM_EXT",
326
+ "custom_border_color": [
327
+ 0.0,
328
+ 0.0,
329
+ 0.0,
330
+ 0.0
331
+ ],
332
+ "tiling": "OPTIMAL"
333
+ }
334
+ },
335
+ {
336
+ "image": {
337
+ "uid": "in_history",
338
+ "dims": [
339
+ 1,
340
+ 1920,
341
+ 1088,
342
+ 1
343
+ ],
344
+ "src": "./in_history.dds",
345
+ "format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
346
+ "shader_access": "readonly",
347
+ "mips": 1,
348
+ "min_filter": "LINEAR",
349
+ "mag_filter": "LINEAR",
350
+ "mip_filter": "NEAREST",
351
+ "border_address_mode": "CLAMP_EDGE",
352
+ "tiling": "OPTIMAL"
353
+ }
354
+ },
355
+ {
356
+ "image": {
357
+ "uid": "in_feedback_tm1",
358
+ "dims": [
359
+ 1,
360
+ 960,
361
+ 544,
362
+ 1
363
+ ],
364
+ "src": "./in_feedback_tm1.dds",
365
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
366
+ "shader_access": "readonly",
367
+ "mips": 1,
368
+ "min_filter": "LINEAR",
369
+ "mag_filter": "LINEAR",
370
+ "mip_filter": "NEAREST",
371
+ "border_address_mode": "CLAMP_BORDER",
372
+ "border_color": "FLOAT_CUSTOM_EXT",
373
+ "custom_border_color": [
374
+ -1.0,
375
+ -1.0,
376
+ -1.0,
377
+ -1.0
378
+ ],
379
+ "tiling": "OPTIMAL"
380
+ }
381
+ },
382
+ {
383
+ "image": {
384
+ "uid": "in_derivative_tm1",
385
+ "dims": [
386
+ 1,
387
+ 960,
388
+ 544,
389
+ 1
390
+ ],
391
+ "src": "./in_derivative_tm1.dds",
392
+ "format": "VK_FORMAT_R8G8_UNORM",
393
+ "shader_access": "readonly",
394
+ "mips": 1,
395
+ "min_filter": "LINEAR",
396
+ "mag_filter": "LINEAR",
397
+ "mip_filter": "NEAREST",
398
+ "border_address_mode": "CLAMP_BORDER",
399
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
400
+ "tiling": "OPTIMAL"
401
+ }
402
+ },
403
+ {
404
+ "image": {
405
+ "uid": "in_depth",
406
+ "dims": [
407
+ 1,
408
+ 960,
409
+ 544,
410
+ 1
411
+ ],
412
+ "src": "./in_depth.dds",
413
+ "format": "VK_FORMAT_R32_SFLOAT",
414
+ "shader_access": "readonly",
415
+ "mips": 1,
416
+ "min_filter": "LINEAR",
417
+ "mag_filter": "LINEAR",
418
+ "mip_filter": "NEAREST",
419
+ "border_address_mode": "CLAMP_BORDER",
420
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
421
+ "tiling": "OPTIMAL"
422
+ }
423
+ },
424
+ {
425
+ "image": {
426
+ "uid": "out_derivative",
427
+ "dims": [
428
+ 1,
429
+ 960,
430
+ 544,
431
+ 1
432
+ ],
433
+ "dst": "./out_derivative.dds",
434
+ "format": "VK_FORMAT_R8G8_UNORM",
435
+ "shader_access": "writeonly",
436
+ "mips": 1,
437
+ "tiling": "LINEAR"
438
+ }
439
+ },
440
+ {
441
+ "image": {
442
+ "uid": "out_nearest_offset",
443
+ "dims": [
444
+ 1,
445
+ 960,
446
+ 544,
447
+ 1
448
+ ],
449
+ "dst": "./out_nearest_offset.dds",
450
+ "format": "VK_FORMAT_R8_UNORM",
451
+ "shader_access": "readwrite",
452
+ "mips": 1,
453
+ "min_filter": "LINEAR",
454
+ "mag_filter": "LINEAR",
455
+ "mip_filter": "NEAREST",
456
+ "border_address_mode": "CLAMP_BORDER",
457
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
458
+ "tiling": "LINEAR"
459
+ }
460
+ },
461
+ {
462
+ "tensor": {
463
+ "uid": "out_input_tensor",
464
+ "dims": [
465
+ 1,
466
+ 544,
467
+ 960,
468
+ 12
469
+ ],
470
+ "dst": "./out_input_tensor.npy",
471
+ "format": "VK_FORMAT_R8_SINT",
472
+ "shader_access": "readwrite",
473
+ "tiling": "LINEAR"
474
+ }
475
+ },
476
+ {
477
+ "graph": {
478
+ "uid": "1_nss",
479
+ "src": "./1_nss.vgf"
480
+ }
481
+ },
482
+ {
483
+ "tensor_barrier": {
484
+ "uid": "barrier_14",
485
+ "src_access": "compute_shader_write",
486
+ "dst_access": "graph_read",
487
+ "src_stage": [
488
+ "compute"
489
+ ],
490
+ "dst_stage": [
491
+ "graph"
492
+ ],
493
+ "tensor_resource": "out_input_tensor"
494
+ }
495
+ },
496
+ {
497
+ "tensor": {
498
+ "uid": "out_feedback",
499
+ "dims": [
500
+ 1,
501
+ 544,
502
+ 960,
503
+ 4
504
+ ],
505
+ "dst": "./out_feedback.npy",
506
+ "format": "VK_FORMAT_R8_SINT",
507
+ "shader_access": "writeonly",
508
+ "tiling": "LINEAR"
509
+ }
510
+ },
511
+ {
512
+ "image": {
513
+ "uid": "out_tp",
514
+ "dims": [
515
+ 1,
516
+ 960,
517
+ 544,
518
+ 1
519
+ ],
520
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
521
+ "shader_access": "readonly",
522
+ "mips": 1,
523
+ "min_filter": "LINEAR",
524
+ "mag_filter": "LINEAR",
525
+ "mip_filter": "NEAREST",
526
+ "border_address_mode": "CLAMP_BORDER",
527
+ "border_color": "FLOAT_TRANSPARENT_BLACK",
528
+ "tiling": "LINEAR"
529
+ }
530
+ },
531
+ {
532
+ "tensor": {
533
+ "uid": "out_tp_aliaser",
534
+ "dims": [
535
+ 1,
536
+ 544,
537
+ 960,
538
+ 4
539
+ ],
540
+ "format": "VK_FORMAT_R8_SINT",
541
+ "shader_access": "readwrite",
542
+ "alias_target": {
543
+ "resource_ref": "out_tp"
544
+ },
545
+ "tiling": "LINEAR"
546
+ }
547
+ },
548
+ {
549
+ "image": {
550
+ "uid": "out_k3",
551
+ "dims": [
552
+ 1,
553
+ 960,
554
+ 544,
555
+ 1
556
+ ],
557
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
558
+ "shader_access": "readonly",
559
+ "mips": 1,
560
+ "min_filter": "LINEAR",
561
+ "mag_filter": "LINEAR",
562
+ "mip_filter": "NEAREST",
563
+ "border_address_mode": "CLAMP_EDGE",
564
+ "tiling": "LINEAR"
565
+ }
566
+ },
567
+ {
568
+ "tensor": {
569
+ "uid": "out_k3_aliaser",
570
+ "dims": [
571
+ 1,
572
+ 544,
573
+ 960,
574
+ 4
575
+ ],
576
+ "format": "VK_FORMAT_R8_SINT",
577
+ "shader_access": "readwrite",
578
+ "alias_target": {
579
+ "resource_ref": "out_k3"
580
+ },
581
+ "tiling": "LINEAR"
582
+ }
583
+ },
584
+ {
585
+ "image": {
586
+ "uid": "out_k2",
587
+ "dims": [
588
+ 1,
589
+ 960,
590
+ 544,
591
+ 1
592
+ ],
593
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
594
+ "shader_access": "readonly",
595
+ "mips": 1,
596
+ "min_filter": "LINEAR",
597
+ "mag_filter": "LINEAR",
598
+ "mip_filter": "NEAREST",
599
+ "border_address_mode": "CLAMP_EDGE",
600
+ "tiling": "LINEAR"
601
+ }
602
+ },
603
+ {
604
+ "tensor": {
605
+ "uid": "out_k2_aliaser",
606
+ "dims": [
607
+ 1,
608
+ 544,
609
+ 960,
610
+ 4
611
+ ],
612
+ "format": "VK_FORMAT_R8_SINT",
613
+ "shader_access": "readwrite",
614
+ "alias_target": {
615
+ "resource_ref": "out_k2"
616
+ },
617
+ "tiling": "LINEAR"
618
+ }
619
+ },
620
+ {
621
+ "image": {
622
+ "uid": "out_k1",
623
+ "dims": [
624
+ 1,
625
+ 960,
626
+ 544,
627
+ 1
628
+ ],
629
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
630
+ "shader_access": "readonly",
631
+ "mips": 1,
632
+ "min_filter": "LINEAR",
633
+ "mag_filter": "LINEAR",
634
+ "mip_filter": "NEAREST",
635
+ "border_address_mode": "CLAMP_EDGE",
636
+ "tiling": "LINEAR"
637
+ }
638
+ },
639
+ {
640
+ "tensor": {
641
+ "uid": "out_k1_aliaser",
642
+ "dims": [
643
+ 1,
644
+ 544,
645
+ 960,
646
+ 4
647
+ ],
648
+ "format": "VK_FORMAT_R8_SINT",
649
+ "shader_access": "readwrite",
650
+ "alias_target": {
651
+ "resource_ref": "out_k1"
652
+ },
653
+ "tiling": "LINEAR"
654
+ }
655
+ },
656
+ {
657
+ "image": {
658
+ "uid": "out_k0",
659
+ "dims": [
660
+ 1,
661
+ 960,
662
+ 544,
663
+ 1
664
+ ],
665
+ "format": "VK_FORMAT_R8G8B8A8_SNORM",
666
+ "shader_access": "readonly",
667
+ "mips": 1,
668
+ "min_filter": "LINEAR",
669
+ "mag_filter": "LINEAR",
670
+ "mip_filter": "NEAREST",
671
+ "border_address_mode": "CLAMP_EDGE",
672
+ "tiling": "LINEAR"
673
+ }
674
+ },
675
+ {
676
+ "tensor": {
677
+ "uid": "out_k0_aliaser",
678
+ "dims": [
679
+ 1,
680
+ 544,
681
+ 960,
682
+ 4
683
+ ],
684
+ "format": "VK_FORMAT_R8_SINT",
685
+ "shader_access": "readwrite",
686
+ "alias_target": {
687
+ "resource_ref": "out_k0"
688
+ },
689
+ "tiling": "LINEAR"
690
+ }
691
+ },
692
+ {
693
+ "shader": {
694
+ "uid": "2_post_process",
695
+ "src": "./2_post_process.spv",
696
+ "entry": "main",
697
+ "type": "SPIR-V",
698
+ "push_constants_size": 76,
699
+ "specialization_constants": []
700
+ }
701
+ },
702
+ {
703
+ "raw_data": {
704
+ "uid": "push_data_22",
705
+ "src": "./2_post_process_push_consts.npy"
706
+ }
707
+ },
708
+ {
709
+ "image_barrier": {
710
+ "uid": "barrier_23",
711
+ "src_access": "compute_shader_write",
712
+ "dst_access": "compute_shader_read",
713
+ "old_layout": "general",
714
+ "new_layout": "general",
715
+ "src_stage": [
716
+ "compute"
717
+ ],
718
+ "dst_stage": [
719
+ "compute"
720
+ ],
721
+ "image_resource": "out_nearest_offset"
722
+ }
723
+ },
724
+ {
725
+ "image_barrier": {
726
+ "uid": "barrier_25",
727
+ "src_access": "graph_write",
728
+ "dst_access": "compute_shader_read",
729
+ "old_layout": "general",
730
+ "new_layout": "general",
731
+ "src_stage": [
732
+ "graph"
733
+ ],
734
+ "dst_stage": [
735
+ "compute"
736
+ ],
737
+ "image_resource": "out_k0"
738
+ }
739
+ },
740
+ {
741
+ "image_barrier": {
742
+ "uid": "barrier_27",
743
+ "src_access": "graph_write",
744
+ "dst_access": "compute_shader_read",
745
+ "old_layout": "general",
746
+ "new_layout": "general",
747
+ "src_stage": [
748
+ "graph"
749
+ ],
750
+ "dst_stage": [
751
+ "compute"
752
+ ],
753
+ "image_resource": "out_k1"
754
+ }
755
+ },
756
+ {
757
+ "image_barrier": {
758
+ "uid": "barrier_29",
759
+ "src_access": "graph_write",
760
+ "dst_access": "compute_shader_read",
761
+ "old_layout": "general",
762
+ "new_layout": "general",
763
+ "src_stage": [
764
+ "graph"
765
+ ],
766
+ "dst_stage": [
767
+ "compute"
768
+ ],
769
+ "image_resource": "out_k2"
770
+ }
771
+ },
772
+ {
773
+ "image_barrier": {
774
+ "uid": "barrier_31",
775
+ "src_access": "graph_write",
776
+ "dst_access": "compute_shader_read",
777
+ "old_layout": "general",
778
+ "new_layout": "general",
779
+ "src_stage": [
780
+ "graph"
781
+ ],
782
+ "dst_stage": [
783
+ "compute"
784
+ ],
785
+ "image_resource": "out_k3"
786
+ }
787
+ },
788
+ {
789
+ "image_barrier": {
790
+ "uid": "barrier_33",
791
+ "src_access": "graph_write",
792
+ "dst_access": "compute_shader_read",
793
+ "old_layout": "general",
794
+ "new_layout": "general",
795
+ "src_stage": [
796
+ "graph"
797
+ ],
798
+ "dst_stage": [
799
+ "compute"
800
+ ],
801
+ "image_resource": "out_tp"
802
+ }
803
+ },
804
+ {
805
+ "image": {
806
+ "uid": "out_colour",
807
+ "dims": [
808
+ 1,
809
+ 1920,
810
+ 1088,
811
+ 1
812
+ ],
813
+ "dst": "./out_colour.dds",
814
+ "format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
815
+ "shader_access": "writeonly",
816
+ "mips": 1,
817
+ "tiling": "LINEAR"
818
+ }
819
+ }
820
+ ]
821
+ }
scenario/typedefs.h ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ //
2
+ // -----------------------------------------------------------------------------
3
+ // The proprietary software and information contained in this file is
4
+ // confidential and may only be used by an authorized person under a valid
5
+ // licensing agreement from Arm Limited or its affiliates.
6
+ //
7
+ // Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
8
+ //
9
+ // This entire notice must be reproduced on all copies of this file and
10
+ // copies of this file may only be made by an authorized person under a valid
11
+ // licensing agreement from Arm Limited or its affiliates.
12
+ // -----------------------------------------------------------------------------
13
+ //
14
+ #ifndef NSS_TYPEDEFS
15
+ #define NSS_TYPEDEFS
16
+
17
+ // fp16 types
18
+ #define half float16_t
19
+ #define half2 f16vec2
20
+ #define half3 f16vec3
21
+ #define half4 f16vec4
22
+
23
+ // fp32 types
24
+ #define float float32_t
25
+ #define float2 f32vec2
26
+ #define float3 f32vec3
27
+ #define float4 f32vec4
28
+
29
+ // int8 types
30
+ #define int8_t int8_t
31
+ #define int8_t2 i8vec2
32
+ #define int8_t3 i8vec3
33
+ #define int8_t4 i8vec4
34
+
35
+ // int16 types
36
+ #define int16_t int16_t
37
+ #define int16_t2 i16vec2
38
+ #define int16_t3 i16vec3
39
+ #define int16_t4 i16vec4
40
+
41
+ // uint16 types
42
+ #define uint16_t uint16_t
43
+ #define uint16_t2 u16vec2
44
+ #define uint16_t3 u16vec3
45
+ #define uint16_t4 u16vec4
46
+
47
+ // int32 types
48
+ #define int32_t int32_t
49
+ #define int32_t2 i32vec2
50
+ #define int32_t3 i32vec3
51
+ #define int32_t4 i32vec4
52
+
53
+ // uint32 types
54
+ #define uint32_t uint32_t
55
+ #define uint32_t2 u32vec2
56
+ #define uint32_t3 u32vec3
57
+ #define uint32_t4 u32vec4
58
+
59
+ // methods
60
+ #define lerp mix
61
+
62
+ // --- RCP functions for float16 types ---
63
+ half rcp(half x) { return half( 1.HF) / x; }
64
+ half2 rcp(half2 x) { return half2(1.HF) / x; }
65
+ half3 rcp(half3 x) { return half3(1.HF) / x; }
66
+ half4 rcp(half4 x) { return half4(1.HF) / x; }
67
+
68
+ // --- RCP functions for float32 types ---
69
+ float rcp(float x) { return float( 1.0f) / x; }
70
+ float2 rcp(float2 x) { return float2(1.0f) / x; }
71
+ float3 rcp(float3 x) { return float3(1.0f) / x; }
72
+ float4 rcp(float4 x) { return float4(1.0f) / x; }
73
+
74
+ // --- Saturate functions for float16 types ---
75
+ half saturate(half x) { return clamp(x, half( 0.HF), half( 1.HF)); }
76
+ half2 saturate(half2 x) { return clamp(x, half2(0.HF), half2(1.HF)); }
77
+ half3 saturate(half3 x) { return clamp(x, half3(0.HF), half3(1.HF)); }
78
+ half4 saturate(half4 x) { return clamp(x, half4(0.HF), half4(1.HF)); }
79
+
80
+ // --- Saturate functions for float32 types ---
81
+ float saturate(float x) { return clamp(x, 0.f, 1.f); }
82
+ float2 saturate(float2 x) { return clamp(x, float2(0.f), float2(1.f)); }
83
+ float3 saturate(float3 x) { return clamp(x, float3(0.f), float3(1.f)); }
84
+ float4 saturate(float4 x) { return clamp(x, float4(0.f), float4(1.f)); }
85
+
86
+ #endif // NSS_TYPEDEFS
third_party_licenses_and_copyright_notices.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ML SDK Scenario Runner - revision 197a36e
2
+ Source Code: https://github.com/arm/ai-ml-sdk-scenario-runner
3
+ License: Apache-2.0 (https://github.com/arm/ai-ml-sdk-scenario-runner/blob/main/LICENSES/Apache-2.0.txt)
4
+ Copyright Notice: "Copyright 2022-2025 Arm Limited and/or its affiliates <[email protected]>"
5
+
6
+ ML Emulation Layer for Vulkan® - revision 788ac99
7
+ Source Code: https://github.com/arm/ai-ml-emulation-layer-for-vulkan
8
+ License: Apache-2.0 (https://github.com/arm/ai-ml-emulation-layer-for-vulkan/blob/main/LICENSES/Apache-2.0.txt)
9
+ Copyright Notice: "Copyright 2022-2025 Arm Limited and/or its affiliates <[email protected]>"
10
+
11
+ Amazon Lumberyard Bistro
12
+ Asset page: http://developer.nvidia.com/orca/amazon-lumberyard-bistro
13
+ Download page: https://casual-effects.com/g3d/data10/research/model/bistro/Exterior.zip
14
+ License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
15
+ Copyright Notice: "Copyright 2017 Amazon Lumberyard"