Initial content
Browse files- .gitattributes +7 -0
- .gitignore +2 -0
- Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf +0 -0
- LICENSE +2 -0
- README.md +150 -0
- bin/windows-x86_64/VkLayer_Graph.dll +3 -0
- bin/windows-x86_64/VkLayer_Graph.json +29 -0
- bin/windows-x86_64/VkLayer_Tensor.dll +3 -0
- bin/windows-x86_64/VkLayer_Tensor.json +31 -0
- bin/windows-x86_64/scenario-runner.exe +3 -0
- nss_v0.1.0_fp32.pt +3 -0
- nss_v0.1.0_int8.pt +3 -0
- nss_v0.1.0_int8.vgf +3 -0
- nss_v0.1.0_int8_metadata.json +79 -0
- resources/Enchanted_Castle_NSS_Demo.mp4 +3 -0
- resources/model-explorer-screenshot.png +3 -0
- scenario/0_pre_process.comp +572 -0
- scenario/0_pre_process.spv +3 -0
- scenario/0_pre_process_push_consts.npy +3 -0
- scenario/1_nss.vgf +3 -0
- scenario/2_post_process.comp +361 -0
- scenario/2_post_process.spv +3 -0
- scenario/2_post_process_push_consts.npy +3 -0
- scenario/common.h +160 -0
- scenario/in_colour.dds +3 -0
- scenario/in_depth.dds +3 -0
- scenario/in_depth_tm1.dds +3 -0
- scenario/in_derivative_tm1.dds +3 -0
- scenario/in_feedback_tm1.dds +3 -0
- scenario/in_history.dds +3 -0
- scenario/in_motion.dds +3 -0
- scenario/in_nearest_offset_tm1.dds +3 -0
- scenario/kernel_lut.h +83 -0
- scenario/parameters.json +79 -0
- scenario/scenario.json +821 -0
- scenario/typedefs.h +86 -0
- third_party_licenses_and_copyright_notices.txt +15 -0
.gitattributes
CHANGED
@@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.dds filter=lfs diff=lfs merge=lfs -text
|
38 |
+
*.vgf filter=lfs diff=lfs merge=lfs -text
|
39 |
+
*.dll filter=lfs diff=lfs merge=lfs -text
|
40 |
+
*.exe filter=lfs diff=lfs merge=lfs -text
|
41 |
+
*.spv filter=lfs diff=lfs merge=lfs -text
|
42 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
out/
|
2 |
+
bin/linux-x86_64/
|
Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf
ADDED
Binary file (49.6 kB). View file
|
|
LICENSE
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
The license for the model source code can be found at https://github.com/arm/neural-graphics-model-gym/blob/main/LICENSES/Apache-2.0.txt
|
2 |
+
The license for the content of this repository can be found in Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf
|
README.md
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
tags:
|
5 |
+
- android
|
6 |
+
- neural-graphics
|
7 |
+
- gaming
|
8 |
+
- graphics
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
---
|
12 |
+
|
13 |
+
# Neural Super Sampling (NSS)
|
14 |
+
|
15 |
+
Neural Super Sampling (NSS) is an innovative, efficient network for temporal super sampling on mobile devices. Content rendered at 540p can be upscaled to 1080p, resulting in up to 50% GPU savings. With our retraining tools content creators and game studios can build derivatives of the model suited to artwork style and performance requirements.
|
16 |
+
|
17 |
+
### 🎥 Neural Super Sampling Demo
|
18 |
+
|
19 |
+
<video controls width="100%">
|
20 |
+
<source src="https://huggingface.co/Arm/neural-super-sampling/resolve/main/resources/Enchanted_Castle_NSS_Demo.mp4" type="video/mp4">
|
21 |
+
Your browser does not support the video tag.
|
22 |
+
</video>
|
23 |
+
|
24 |
+
## Model Details
|
25 |
+
|
26 |
+
Neural Super Sampling (NSS) is a parameter prediction model for real-time temporal super sampling developed by Arm, optimized for execution on Neural Accelerators (NX) in mobile GPUs. It enables high-resolution rendering at a lower compute cost by reconstructing high-quality output frames from low-resolution temporal inputs. NSS is particularly suited for mobile gaming, XR, and other power-constrained graphics use cases.
|
27 |
+
|
28 |
+
- **Developed by:** Arm Limited
|
29 |
+
- **Model type:** Temporal image super sampling
|
30 |
+
- **License:** Other
|
31 |
+
- **Repository:** [Neural Graphics Model Gym](https://github.com/arm/neural-graphics-model-gym)
|
32 |
+
- **Paper:** [How Neural Super Sampling Works](https://community.arm.com/arm-community-blogs/b/mobile-graphics-and-gaming-blog/posts/how-arm-neural-super-sampling-works)
|
33 |
+
- **Quickstart with ML extensions for Vulkan®**: [ML extensions for Vulkan® Quickstart Guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/)
|
34 |
+
- **Quickstart with Unreal**: [Neural Super Sampling Quickstart Guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/nss-unreal/) for NSS integration into Unreal Engine
|
35 |
+
|
36 |
+
NSS is under active development with regular updates planned. It should not be considered production-grade at this stage. As we increase the size and diversity of the training dataset we expect to see significant quality improvements. Follow Arm to stay up to date on the latest releases.
|
37 |
+
|
38 |
+
The model is released under Arm's [AI Model Community License](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf) which allows NSS to be retrained on datasets captured from your own content. Future releases of the [Neural Graphics Model Gym](https://github.com/arm/neural-graphics-model-gym) will provide the tools to capture and convert content for use in (re)retraining..
|
39 |
+
|
40 |
+
## Uses
|
41 |
+
|
42 |
+
NSS can be directly integrated into graphics pipelines using ML extensions for Vulkan®. See included ML SDK for Vulkan [scenario](https://huggingface.co/Arm/neural-super-sampling/tree/main/scenario) for the simplest way to evaluate the model. The scenario includes the necessary pre- and post-processing compute shaders along with a single frame worth of input data.
|
43 |
+
|
44 |
+
The recommended way of integrating the model into a graphics pipeline is by using the [VGF Library](https://github.com/arm/ai-ml-sdk-vgf-library/tree/main) from the ML SDK for Vulkan.
|
45 |
+
|
46 |
+
NSS is released under a [permissive license](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf) designed to foster innovation in the graphics industry and provide differentiation to content creators.
|
47 |
+
|
48 |
+
### Direct Use
|
49 |
+
|
50 |
+
NSS has been integrated into Unreal Engine via two plugins, the [NSS Plugin for Unreal Engine](https://github.com/arm/neural-graphics-for-unreal/) and [Unreal NNE Plugin for ML extensions for Vulkan](https://github.com/arm/ml-extensions-for-vulkan-unreal-plugin/). See our [quick start guide](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/nss-unreal/) for step-by-step instructions on how to use NSS in Unreal® Engine.
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
- Not suited for non-temporal tasks such as a standalone image upsampling
|
55 |
+
|
56 |
+
## Bias, Risks, and Limitations
|
57 |
+
|
58 |
+
- Requires accurate motion vectors and frame history for stable output
|
59 |
+
- May underperform in extremely low framerate scenarios (<10 FPS) with fast camera movement
|
60 |
+
- Padding of the input is needed if input dimensions are not divisible by 8
|
61 |
+
|
62 |
+
### Recommendations
|
63 |
+
|
64 |
+
For ultra-low-FPS use cases, reduce the camera speed, acceleration, or both so that the relative motion between frames mimics the
|
65 |
+
application running at a higher frame rate.
|
66 |
+
|
67 |
+
## How to Get Started with the Model
|
68 |
+
|
69 |
+
This repository contains pre-trained weights and compiled NSS model in VGF format ready for integration with Vulkan applications.
|
70 |
+
|
71 |
+
The included Scenario demonstrates full execution of the model on a Vulkan compute-capable system. An Emulation Layer is provided to implement ML Extensions for Vulkan where it is not supported by the native Vulkan driver.
|
72 |
+
|
73 |
+
### Download and Prepare the Scenario
|
74 |
+
|
75 |
+
Clone the NSS model repository from Hugging Face
|
76 |
+
```powershell
|
77 |
+
git clone https://huggingface.co/Arm/TestNSS
|
78 |
+
cd TestNSS
|
79 |
+
```
|
80 |
+
|
81 |
+
### Run the Scenario
|
82 |
+
|
83 |
+
The NSS Hugging Face repository includes pre-built Windows® binaries for ML Emulation Layer for Vulkan and Scenario Runner. For other platforms,
|
84 |
+
- build from source following the instructions for [Building the Emulation Layer from source](https://github.com/arm/ai-ml-emulation-layer-for-vulkan/blob/main/README.md#building-the-emulation-layer-from-source) and [Building the Scenario Runner from source](https://github.com/arm/ai-ml-sdk-scenario-runner/blob/main/README.md#building-scenario-runner-from-source)
|
85 |
+
- adapt the instructions below accordingly
|
86 |
+
|
87 |
+
1. Set the required environment variables:
|
88 |
+
|
89 |
+
On Windows:
|
90 |
+
```powershell
|
91 |
+
$env:VK_LAYER_PATH="$PWD\bin\windows-x86_64"
|
92 |
+
$env:VK_INSTANCE_LAYERS="VK_LAYER_ML_Graph_Emulation;VK_LAYER_ML_Tensor_Emulation"
|
93 |
+
```
|
94 |
+
|
95 |
+
On Linux (assuming the Emulation Layer binaries and JSON files and Scenario Runner executable are copied to `bin/linux-x86_64`):
|
96 |
+
```powershell
|
97 |
+
export LD_LIBRARY_PATH=$PWD/bin/linux-x86_64:$LD_LIBRARY_PATH
|
98 |
+
export VK_LAYER_PATH=$PWD/bin/linux-x86_64
|
99 |
+
export VK_INSTANCE_LAYERS=VK_LAYER_ML_Graph_Emulation:VK_LAYER_ML_Tensor_Emulation
|
100 |
+
```
|
101 |
+
|
102 |
+
2. Execute the scenario:
|
103 |
+
|
104 |
+
On Windows:
|
105 |
+
```powershell
|
106 |
+
bin\windows-x86_64\scenario-runner.exe --scenario scenario\scenario.json --output out
|
107 |
+
```
|
108 |
+
|
109 |
+
On Linux:
|
110 |
+
```powershell
|
111 |
+
bin/linux-x86_64/scenario-runner --scenario scenario/scenario.json --output out
|
112 |
+
```
|
113 |
+
|
114 |
+
- Output images are encoded as `B10G11R11_UFLOAT`. This format is common for framebuffers but not widely supported by image viewers. Use [RenderDoc](https://renderdoc.org/) to view these images.
|
115 |
+
|
116 |
+
## Training and Evaluation
|
117 |
+
|
118 |
+
For background on NSS architecture and training read our blog: [How Neural Super Sampling Works](https://community.arm.com/arm-community-blogs/b/mobile-graphics-and-gaming-blog/posts/how-arm-neural-super-sampling-works)
|
119 |
+
|
120 |
+
Training and evaluation details, including model architecture code, training pipeline, and test configurations, are available at:
|
121 |
+
|
122 |
+
- Model training code: https://github.com/arm/neural-graphics-model-gym
|
123 |
+
- Examples and tutorials: https://github.com/arm/neural-graphics-model-gym-examples
|
124 |
+
- Sample dataset: https://huggingface.co/datasets/Arm/neural-graphics-dataset
|
125 |
+
|
126 |
+
### 🔎 Model Explorer VGF extension
|
127 |
+
|
128 |
+
The [VGF extension to Model Explorer](https://github.com/arm/vgf-adapter-model-explorer) provides a simple interface to visualize model and analyse VGF composition.
|
129 |
+
|
130 |
+

|
131 |
+
|
132 |
+
## License
|
133 |
+
|
134 |
+
- The license for the model source code can be found [here](https://github.com/arm/neural-graphics-model-gym/blob/main/LICENSES/Apache-2.0.txt).
|
135 |
+
- The license for the content of this repository can be found [here](https://huggingface.co/Arm/neural-super-sampling/blob/main/Arm_AI_Model_Community_License_v1_0_PRE-1154.pdf)
|
136 |
+
|
137 |
+
## More Information
|
138 |
+
|
139 |
+
🧑🔬 More technical details about the model can be found in the [NSS Guide](https://developer.arm.com/documentation/111009/latest/).
|
140 |
+
|
141 |
+
👩🏽💻 Our [Neural Graphics Development Kit](https://developer.arm.com/mobile-graphics-and-gaming/neural-graphics) contains engine plugins, model training tools, code examples and extensive developer documentation.
|
142 |
+
|
143 |
+
🙋🏻♀️ For questions or feedback please [start a discussion](https://huggingface.co/Arm/neural-super-sampling/discussions)
|
144 |
+
|
145 |
+
## Trademark notice
|
146 |
+
Arm® is a registered trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
|
147 |
+
|
148 |
+
Windows® is a trademark of the Microsoft group of companies.
|
149 |
+
|
150 |
+
Vulkan® is a registered trademark of the [Khronos® Group](https://www.khronos.org/legal/trademarks).
|
bin/windows-x86_64/VkLayer_Graph.dll
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0a2fd54f62bef850685bcc4331714e0590a0a3aef28ea2dd6aa8c9e6f68f4da0
|
3 |
+
size 7437312
|
bin/windows-x86_64/VkLayer_Graph.json
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"file_format_version": "1.0.0",
|
3 |
+
"layer": {
|
4 |
+
"name": "VK_LAYER_ML_Graph_Emulation",
|
5 |
+
"type": "INSTANCE",
|
6 |
+
"library_path": ".\\VkLayer_Graph.dll",
|
7 |
+
"api_version": "1.3.0",
|
8 |
+
"implementation_version": "1",
|
9 |
+
"description": "ML Graph Emulation Layer",
|
10 |
+
"functions": {
|
11 |
+
"vkGetInstanceProcAddr": "graphGetInstanceProcAddr",
|
12 |
+
"vkGetDeviceProcAddr": "graphGetDeviceProcAddr"
|
13 |
+
},
|
14 |
+
"device_extensions": [
|
15 |
+
{
|
16 |
+
"name": "VK_ARM_data_graph",
|
17 |
+
"spec_version": "1",
|
18 |
+
"entrypoints": [
|
19 |
+
"vkGetPhysicalDeviceGraphInstructionSetsARM",
|
20 |
+
"vkCreateGraphPipelinesARM",
|
21 |
+
"vkCreateGraphPipelineSessionARM",
|
22 |
+
"vkGetGraphPipelineSessionMemoryRequirementsARM",
|
23 |
+
"vkBindGraphPipelineSessionMemoryARM",
|
24 |
+
"vkDestroyGraphPipelineSessionARM",
|
25 |
+
"vkCmdDispatchGraphARM"
|
26 |
+
]
|
27 |
+
}]
|
28 |
+
}
|
29 |
+
}
|
bin/windows-x86_64/VkLayer_Tensor.dll
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:15a7e7f9b4ff74f530a550669740f1c8b5295bdbe9a0474da3e3b9e906c4ce76
|
3 |
+
size 5689344
|
bin/windows-x86_64/VkLayer_Tensor.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"file_format_version": "1.0.0",
|
3 |
+
"layer": {
|
4 |
+
"name": "VK_LAYER_ML_Tensor_Emulation",
|
5 |
+
"type": "INSTANCE",
|
6 |
+
"library_path": ".\\VkLayer_Tensor.dll",
|
7 |
+
"api_version": "1.3.0",
|
8 |
+
"implementation_version": "1",
|
9 |
+
"description": "ML Tensor Emulation Layer",
|
10 |
+
"functions": {
|
11 |
+
"vkGetInstanceProcAddr": "tensorGetInstanceProcAddr",
|
12 |
+
"vkGetDeviceProcAddr": "tensorGetDeviceProcAddr"
|
13 |
+
},
|
14 |
+
"device_extensions": [
|
15 |
+
{
|
16 |
+
"name": "VK_ARM_tensors",
|
17 |
+
"spec_version": "1",
|
18 |
+
"entrypoints": [
|
19 |
+
"vkCreateTensorARM",
|
20 |
+
"vkDestroyTensorARM",
|
21 |
+
"vkCreateTensorViewARM",
|
22 |
+
"vkDestroyTensorViewARM",
|
23 |
+
"vkGetTensorMemoryRequirementsARM",
|
24 |
+
"vkBindTensorMemoryARM",
|
25 |
+
"vkGetDeviceTensorMemoryRequirementsARM",
|
26 |
+
"vkCmdCopyTensorARM"
|
27 |
+
]
|
28 |
+
}
|
29 |
+
]
|
30 |
+
}
|
31 |
+
}
|
bin/windows-x86_64/scenario-runner.exe
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5a99f1076dec1f504d64387969a3d18dc687c7f95c505dca812e4fcbca60f3d2
|
3 |
+
size 5285376
|
nss_v0.1.0_fp32.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:391ad31f72783175afcb94180e0d8ffad7a34a6d848edfdea3409677236fc1da
|
3 |
+
size 553364
|
nss_v0.1.0_int8.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:57ebcd81596ca7720015fc42b8c3d509c45d0c031244b024c7d1056b671dce9d
|
3 |
+
size 665897
|
nss_v0.1.0_int8.vgf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f2bb554b54f186111150cbe3f80b258300d22e6e6e23610a6c519abe1962d8f9
|
3 |
+
size 162900
|
nss_v0.1.0_int8_metadata.json
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dm_scale_on_no_motion": [
|
3 |
+
0.617464542388916
|
4 |
+
],
|
5 |
+
"inputs": {
|
6 |
+
"x": {
|
7 |
+
"SINT": {
|
8 |
+
"scale": 0.003921568859368563,
|
9 |
+
"zero_point": -128
|
10 |
+
},
|
11 |
+
"SNORM": {
|
12 |
+
"scale": 0.49803924513980746,
|
13 |
+
"zero_point": -1.0078740157480315
|
14 |
+
}
|
15 |
+
}
|
16 |
+
},
|
17 |
+
"outputs": {
|
18 |
+
"activation_post_process_45": {
|
19 |
+
"SINT": {
|
20 |
+
"scale": 0.003937007859349251,
|
21 |
+
"zero_point": -127
|
22 |
+
},
|
23 |
+
"SNORM": {
|
24 |
+
"scale": 0.49999999813735485,
|
25 |
+
"zero_point": -1.0
|
26 |
+
}
|
27 |
+
},
|
28 |
+
"activation_post_process_50": {
|
29 |
+
"SINT": {
|
30 |
+
"scale": 0.003937007859349251,
|
31 |
+
"zero_point": -127
|
32 |
+
},
|
33 |
+
"SNORM": {
|
34 |
+
"scale": 0.49999999813735485,
|
35 |
+
"zero_point": -1.0
|
36 |
+
}
|
37 |
+
},
|
38 |
+
"activation_post_process_55": {
|
39 |
+
"SINT": {
|
40 |
+
"scale": 0.003937007859349251,
|
41 |
+
"zero_point": -127
|
42 |
+
},
|
43 |
+
"SNORM": {
|
44 |
+
"scale": 0.49999999813735485,
|
45 |
+
"zero_point": -1.0
|
46 |
+
}
|
47 |
+
},
|
48 |
+
"activation_post_process_60": {
|
49 |
+
"SINT": {
|
50 |
+
"scale": 0.003937007859349251,
|
51 |
+
"zero_point": -127
|
52 |
+
},
|
53 |
+
"SNORM": {
|
54 |
+
"scale": 0.49999999813735485,
|
55 |
+
"zero_point": -1.0
|
56 |
+
}
|
57 |
+
},
|
58 |
+
"activation_post_process_65": {
|
59 |
+
"SINT": {
|
60 |
+
"scale": 0.003937007859349251,
|
61 |
+
"zero_point": -127
|
62 |
+
},
|
63 |
+
"SNORM": {
|
64 |
+
"scale": 0.49999999813735485,
|
65 |
+
"zero_point": -1.0
|
66 |
+
}
|
67 |
+
},
|
68 |
+
"activation_post_process_70": {
|
69 |
+
"SINT": {
|
70 |
+
"scale": 0.003937007859349251,
|
71 |
+
"zero_point": -127
|
72 |
+
},
|
73 |
+
"SNORM": {
|
74 |
+
"scale": 0.49999999813735485,
|
75 |
+
"zero_point": -1.0
|
76 |
+
}
|
77 |
+
}
|
78 |
+
}
|
79 |
+
}
|
resources/Enchanted_Castle_NSS_Demo.mp4
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:13cc07b5829e7335b94b548314dd189add180ae4b6fd4d1529db37de72a9c3d8
|
3 |
+
size 96767057
|
resources/model-explorer-screenshot.png
ADDED
![]() |
Git LFS Details
|
scenario/0_pre_process.comp
ADDED
@@ -0,0 +1,572 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
//
|
2 |
+
// -----------------------------------------------------------------------------
|
3 |
+
// The proprietary software and information contained in this file is
|
4 |
+
// confidential and may only be used by an authorized person under a valid
|
5 |
+
// licensing agreement from Arm Limited or its affiliates.
|
6 |
+
//
|
7 |
+
// Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
|
8 |
+
//
|
9 |
+
// This entire notice must be reproduced on all copies of this file and
|
10 |
+
// copies of this file may only be made by an authorized person under a valid
|
11 |
+
// licensing agreement from Arm Limited or its affiliates.
|
12 |
+
// -----------------------------------------------------------------------------
|
13 |
+
//
|
14 |
+
#version 460
|
15 |
+
#extension GL_EXT_shader_8bit_storage : require
|
16 |
+
#extension GL_EXT_shader_16bit_storage : require
|
17 |
+
#extension GL_EXT_shader_explicit_arithmetic_types : require
|
18 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_int8 : require
|
19 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : require
|
20 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_float32 : require
|
21 |
+
#extension GL_GOOGLE_include_directive : enable
|
22 |
+
#extension GL_ARM_tensors : require
|
23 |
+
|
24 |
+
// includes
|
25 |
+
#include "typedefs.h"
|
26 |
+
#include "common.h"
|
27 |
+
|
28 |
+
// types
|
29 |
+
|
30 |
+
struct TensorElement
|
31 |
+
{
|
32 |
+
int8_t4 wh_rgb_col_r; // warped_history.rgb, jittered_colour.r
|
33 |
+
int8_t4 col_gb_dm_fback_r; // jittered_colour.gb, disocclusion mask, feedback.r
|
34 |
+
int8_t4 fback_gba_ld; // feedback.gba, luma derivative
|
35 |
+
};
|
36 |
+
|
37 |
+
// inputs
|
38 |
+
layout (set=0, binding=0) uniform mediump sampler2D _ColourTex; // 540p | R11G11B10 32bpp
|
39 |
+
layout (set=0, binding=1) uniform highp sampler2D _DepthTex; // 540p | R32_FLOAT 32bpp
|
40 |
+
layout (set=0, binding=2) uniform mediump sampler2D _MotionVectorTex; // 540p | RG_16 32bpp
|
41 |
+
layout (set=0, binding=3) uniform mediump sampler2D _HistoryTex; // 1080p | R11G11B10 32bpp
|
42 |
+
layout (set=0, binding=4) uniform lowp sampler2D _FeedbackTensor; // 1080p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
43 |
+
layout (set=0, binding=5) uniform highp sampler2D _DepthTm1Tex; // 540p | R32_FLOAT 32bpp
|
44 |
+
layout (set=0, binding=6) uniform lowp sampler2D _LumaDerivTm1Tex; // 540p | R8G8_UNORM 16bpp
|
45 |
+
layout (set=0, binding=7) uniform lowp sampler2D _NearestDepthCoordTm1Tex; // 540p | R8_UNORM 8bpp
|
46 |
+
|
47 |
+
// outputs
|
48 |
+
layout (set=1, binding=0) uniform writeonly tensorARM<int8_t, 4> _PreprocessTensor; // 540p | 12ch 96bpp
|
49 |
+
layout (set=1, binding=1, rg8) uniform writeonly lowp image2D _PreProcessLumaDerivOut; // 540p | R8G8 16bpp
|
50 |
+
layout (set=1, binding=3, r8) uniform writeonly lowp image2D _NearestDepthCoordOut; // 540p | R8 8bpp
|
51 |
+
|
52 |
+
// push-constants
|
53 |
+
layout(push_constant, std430) uniform PushConstants {
|
54 |
+
// ─────────────── 16-byte aligned ───────────────
|
55 |
+
layout(offset = 0) float4 _DeviceToViewDepth; // 16 B
|
56 |
+
layout(offset = 16) float4 _JitterOffset; // 16 B (.xy = pixels, .zw = uvs)
|
57 |
+
layout(offset = 32) float4 _JitterOffsetTm1; // 16 B (.xy = pixels, .zw = uvs)
|
58 |
+
layout(offset = 48) float4 _ScaleFactor; // 16 B (.xy = scale, .zw = inv scale)
|
59 |
+
|
60 |
+
// ─────────────── 8-byte aligned ───────────────
|
61 |
+
layout(offset = 64) int32_t2 _OutputDims; // 8 B
|
62 |
+
layout(offset = 72) int32_t2 _InputDims; // 8 B
|
63 |
+
layout(offset = 80) float2 _InvOutputDims; // 8 B
|
64 |
+
layout(offset = 88) float2 _InvInputDims; // 8 B
|
65 |
+
layout(offset = 96) half4 _QuantParams; // 8 B (.xy SINT, .zw SNORM)
|
66 |
+
layout(offset = 104) half4 _MotionDisThreshPad; // 8 B (.xyzw = motion/disocclusion thresholds)
|
67 |
+
|
68 |
+
// ─────────────── 4-byte aligned ───────────────
|
69 |
+
layout(offset = 112) half2 _Exposure; // 4 B (.x = exposure, .y = 1/exp)
|
70 |
+
layout(offset = 116) half2 _HistoryPad; // 4 B
|
71 |
+
|
72 |
+
// ─────────────── padding to 16-byte struct size ────
|
73 |
+
layout(offset = 120) int32_t2 _Padding; // 8 B
|
74 |
+
|
75 |
+
// Total: **128 bytes**
|
76 |
+
};
|
77 |
+
|
78 |
+
// Convenience mapping for accessing push constants
|
79 |
+
#define _Scale _ScaleFactor.xy
|
80 |
+
#define _InvScale _ScaleFactor.zw
|
81 |
+
#define _Exposure _Exposure.x
|
82 |
+
#define _InvExposure _Exposure.y
|
83 |
+
#define _JitterOffsetPix _JitterOffset.xy
|
84 |
+
#define _JitterOffsetUv _JitterOffset.zw
|
85 |
+
#define _JitterOffsetTm1Pix _JitterOffsetTm1.xy
|
86 |
+
#define _JitterOffsetTm1Uv _JitterOffsetTm1.zw
|
87 |
+
#define _MotionWarpThresh _MotionDisThreshPad.x
|
88 |
+
#define _MotionDisThresh _MotionDisThreshPad.y
|
89 |
+
#define _DisocclusionScale _MotionDisThreshPad.z
|
90 |
+
#define _NotHistoryReset _HistoryPad.x
|
91 |
+
|
92 |
+
// Quantization Parameters
|
93 |
+
// inside: `./parameters.json`
|
94 |
+
// these values are embdedded inside the TOSA file and learnt during QAT
|
95 |
+
|
96 |
+
#ifndef _InputQuantParams
|
97 |
+
// inputs - x["SINT"]
|
98 |
+
#define _InputQuantParams _QuantParams.xy
|
99 |
+
#endif
|
100 |
+
#ifndef _FeedbackQuantParams
|
101 |
+
// outputs - activation_post_process_70["SNORM"]
|
102 |
+
#define _FeedbackQuantParams _QuantParams.zw
|
103 |
+
#endif
|
104 |
+
|
105 |
+
// constants
|
106 |
+
|
107 |
+
#ifdef INVERTED_DEPTH
|
108 |
+
#define MAX_DEPTH 0.f
|
109 |
+
#else
|
110 |
+
#define MAX_DEPTH 1.f
|
111 |
+
#endif
|
112 |
+
|
113 |
+
|
114 |
+
// methods
|
115 |
+
|
116 |
+
bool IsOnScreen(int32_t2 pos, int32_t2 size)
|
117 |
+
{
|
118 |
+
return all(lessThan(uint32_t2(pos), uint32_t2(size)));
|
119 |
+
}
|
120 |
+
|
121 |
+
|
122 |
+
half2 LoadMotion(int32_t2 pixel)
|
123 |
+
{
|
124 |
+
return half2(texelFetch(_MotionVectorTex, pixel, 0).rg);
|
125 |
+
}
|
126 |
+
|
127 |
+
|
128 |
+
half3 LoadColour(int32_t2 pixel)
|
129 |
+
{
|
130 |
+
return Tonemap(SafeColour(half3(texelFetch(_ColourTex, pixel, 0).rgb) * _Exposure));
|
131 |
+
}
|
132 |
+
|
133 |
+
|
134 |
+
int32_t2 LoadDepthNearestDepthOffsetTm1(int32_t2 pixel)
|
135 |
+
{
|
136 |
+
int32_t2 is_oob = int32_t2(IsOnScreen(pixel, _InputDims));
|
137 |
+
pixel = clamp(pixel, int32_t2(0), _InputDims - int32_t2(1));
|
138 |
+
|
139 |
+
half encNorm = half(texelFetch(_NearestDepthCoordTm1Tex, pixel, 0).r);
|
140 |
+
int32_t code = int32_t(encNorm * 255.0 + 0.5);
|
141 |
+
|
142 |
+
// 3. map back to {-1,0,1}²
|
143 |
+
return DecodeNearestDepthCoord(code) * is_oob;
|
144 |
+
}
|
145 |
+
|
146 |
+
void GatherReconstructedPreviousDepthRQuad(float2 fUV, inout float4 depthQuad)
|
147 |
+
{
|
148 |
+
int32_t2 offset = LoadDepthNearestDepthOffsetTm1(int32_t2(fUV * _InputDims));
|
149 |
+
float2 offset_uv = float2(offset) * _InvInputDims;
|
150 |
+
depthQuad = textureGather(_DepthTm1Tex, fUV + offset_uv, 0).wzxy;
|
151 |
+
}
|
152 |
+
|
153 |
+
|
154 |
+
half3 WarpHistory(float2 uv)
|
155 |
+
{
|
156 |
+
return Tonemap(SafeColour(half3(textureLod(_HistoryTex, uv, 0).rgb) * _Exposure));
|
157 |
+
}
|
158 |
+
|
159 |
+
|
160 |
+
half4 WarpFeedback(float2 uv)
|
161 |
+
{
|
162 |
+
return Dequantize(half4(textureLod(_FeedbackTensor, uv, 0)), _FeedbackQuantParams);
|
163 |
+
}
|
164 |
+
|
165 |
+
|
166 |
+
half2 WarpLumaDerivative(float2 uv)
|
167 |
+
{
|
168 |
+
return half2(textureLod(_LumaDerivTm1Tex, uv, 0).rg);
|
169 |
+
}
|
170 |
+
|
171 |
+
|
172 |
+
half2 CalculateLumaDerivative(float2 reproj_uv, half3 jittered_colour, half disocclusion_mask)
|
173 |
+
{
|
174 |
+
const half DIS_THRESH = 0.01HF;
|
175 |
+
const half DERIV_MIN = 0.05HF;
|
176 |
+
const half DERIV_MAX = 0.3HF;
|
177 |
+
const half DERIV_POW = 1.5HF;
|
178 |
+
const half DERIV_ALPHA = 0.1HF;
|
179 |
+
const half DERIV_MAX_R = rcp(DERIV_MAX);
|
180 |
+
const half DERIV_MAX_POW_R = rcp(pow(DERIV_MAX, DERIV_POW));
|
181 |
+
|
182 |
+
//--------------------------------------------------------------------
|
183 |
+
// 1. Fetch history (luma + derivative)
|
184 |
+
//--------------------------------------------------------------------
|
185 |
+
half2 h = WarpLumaDerivative(reproj_uv);
|
186 |
+
half luma_tm1 = h.y;
|
187 |
+
half derivative_tm1 = h.x;
|
188 |
+
|
189 |
+
//--------------------------------------------------------------------
|
190 |
+
// 2. Current luma & raw derivative
|
191 |
+
//--------------------------------------------------------------------
|
192 |
+
half luma_t = Luminance(jittered_colour);
|
193 |
+
half derivative_t = abs(luma_t - luma_tm1);
|
194 |
+
|
195 |
+
//--------------------------------------------------------------------
|
196 |
+
// 3. Soft-clip & normalize
|
197 |
+
//--------------------------------------------------------------------
|
198 |
+
// Clip to `DERIV_MAX` which is ~typical max value,
|
199 |
+
// allows for better precision allocation when normalized
|
200 |
+
half clipped = min(derivative_t, DERIV_MAX);
|
201 |
+
|
202 |
+
// Discard values less than `DERIV_MIN` to reduce ghosting
|
203 |
+
clipped *= step(DERIV_MIN, derivative_t);
|
204 |
+
|
205 |
+
// Normalize with soft-clip
|
206 |
+
// x^1.5 = x * sqrt(x) | NOTE: only works because `DERIV_POW=1.5`
|
207 |
+
half curved = clipped * sqrt(clipped) * DERIV_MAX_POW_R;
|
208 |
+
|
209 |
+
//--------------------------------------------------------------------
|
210 |
+
// 4. Temporal accumulation
|
211 |
+
//--------------------------------------------------------------------
|
212 |
+
// Accumulate the new derivative into the history.
|
213 |
+
// We apply an adaptive alpha scaling, to ensure that if a derivative converges to a high value
|
214 |
+
// it becomes more difficult to reset that value, this provides temporally stable convergence
|
215 |
+
half alpha_scale = mix(DERIV_ALPHA,
|
216 |
+
DERIV_ALPHA * 0.1HF,
|
217 |
+
clamp(derivative_tm1, 0.HF, DERIV_MAX) * DERIV_MAX_R);
|
218 |
+
|
219 |
+
half derivative = mix(derivative_tm1, curved, alpha_scale);
|
220 |
+
|
221 |
+
//--------------------------------------------------------------------
|
222 |
+
// 5. Remove disoccluded pixels
|
223 |
+
//--------------------------------------------------------------------
|
224 |
+
derivative *= step(disocclusion_mask, DIS_THRESH);
|
225 |
+
|
226 |
+
// .x -> derivative for current frame, .y -> luma of current frame
|
227 |
+
return half2(derivative, luma_t);
|
228 |
+
}
|
229 |
+
|
230 |
+
|
231 |
+
void FindNearestDepth(int32_t2 iPxPos, int32_t2 iPxSize, out float fNearestDepth, out int32_t2 fNearestDepthOffset)
|
232 |
+
{
|
233 |
+
/*
|
234 |
+
Closely based on:
|
235 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/38697a58a6e7818ec9d28774bc073f537abb9178/
|
236 |
+
include/gpu/fsr2/ffxm_fsr2_reconstruct_dilated_velocity_and_previous_depth.h#L59
|
237 |
+
*/
|
238 |
+
|
239 |
+
int32_t iSampleIndex = 0;
|
240 |
+
const int32_t iSampleCount = 9;
|
241 |
+
// x, y
|
242 |
+
const int32_t2 iSampleOffsets[iSampleCount] = {
|
243 |
+
int32_t2(+0, +0).yx,
|
244 |
+
int32_t2(+1, +0).yx,
|
245 |
+
int32_t2(+0, +1).yx,
|
246 |
+
int32_t2(+0, -1).yx,
|
247 |
+
int32_t2(-1, +0).yx,
|
248 |
+
int32_t2(-1, +1).yx,
|
249 |
+
int32_t2(+1, +1).yx,
|
250 |
+
int32_t2(-1, -1).yx,
|
251 |
+
int32_t2(+1, -1).yx,
|
252 |
+
};
|
253 |
+
|
254 |
+
// pull out the depth loads to allow SC to batch them
|
255 |
+
float depth[9];
|
256 |
+
depth[0] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, +0).yx).r);
|
257 |
+
depth[1] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, +0).yx).r);
|
258 |
+
depth[2] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, +1).yx).r);
|
259 |
+
depth[3] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+0, -1).yx).r);
|
260 |
+
depth[4] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, +0).yx).r);
|
261 |
+
depth[5] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, +1).yx).r);
|
262 |
+
depth[6] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, +1).yx).r);
|
263 |
+
depth[7] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(-1, -1).yx).r);
|
264 |
+
depth[8] = float(texelFetchOffset(_DepthTex, iPxPos, 0, int32_t2(+1, -1).yx).r);
|
265 |
+
|
266 |
+
// find closest depth
|
267 |
+
fNearestDepth = depth[0];
|
268 |
+
fNearestDepthOffset = iSampleOffsets[0];
|
269 |
+
#pragma unroll
|
270 |
+
for (iSampleIndex = 1; iSampleIndex < iSampleCount; ++iSampleIndex) {
|
271 |
+
|
272 |
+
int32_t2 iPos = iPxPos + iSampleOffsets[iSampleIndex];
|
273 |
+
if (IsOnScreen(iPos, iPxSize)) {
|
274 |
+
|
275 |
+
float fNdDepth = depth[iSampleIndex];
|
276 |
+
#ifdef INVERTED_DEPTH
|
277 |
+
if (fNdDepth > fNearestDepth) {
|
278 |
+
#else
|
279 |
+
if (fNdDepth < fNearestDepth) {
|
280 |
+
#endif
|
281 |
+
fNearestDepth = fNdDepth;
|
282 |
+
fNearestDepthOffset = iSampleOffsets[iSampleIndex];
|
283 |
+
}
|
284 |
+
}
|
285 |
+
}
|
286 |
+
}
|
287 |
+
|
288 |
+
|
289 |
+
int32_t2 RenderSize()
|
290 |
+
{
|
291 |
+
return int32_t2(_InputDims);
|
292 |
+
}
|
293 |
+
|
294 |
+
|
295 |
+
float2 ComputeNdc(float2 fPxPos, int32_t2 iSize)
|
296 |
+
{
|
297 |
+
/*
|
298 |
+
Closely based on:
|
299 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
300 |
+
38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L457
|
301 |
+
*/
|
302 |
+
|
303 |
+
return fPxPos.yx / float2(iSize.yx) * float2(2.0f, -2.0f) + float2(-1.0f, 1.0f);
|
304 |
+
}
|
305 |
+
|
306 |
+
|
307 |
+
float GetViewSpaceDepth(float fDeviceDepth)
|
308 |
+
{
|
309 |
+
/*
|
310 |
+
Closely based on:
|
311 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
312 |
+
38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L462
|
313 |
+
|
314 |
+
`fDeviceToViewDepth` / `_DeviceToViewDepth` details found in:
|
315 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
316 |
+
0501f490bd9946a2e1806b5363d7ab8a9a6a5e0a/src/components/fsr2/ffxm_fsr2.cpp#L829
|
317 |
+
*/
|
318 |
+
|
319 |
+
const float4 fDeviceToViewDepth = _DeviceToViewDepth;
|
320 |
+
|
321 |
+
return (fDeviceToViewDepth[1] / (fDeviceDepth - fDeviceToViewDepth[0]));
|
322 |
+
}
|
323 |
+
|
324 |
+
|
325 |
+
float3 GetViewSpacePosition(int32_t2 iViewportPos, int32_t2 iViewportSize, float fDeviceDepth)
|
326 |
+
{
|
327 |
+
/*
|
328 |
+
Closely based on:
|
329 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
330 |
+
38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L475
|
331 |
+
*/
|
332 |
+
|
333 |
+
const float4 fDeviceToViewDepth = _DeviceToViewDepth;
|
334 |
+
|
335 |
+
const float Z = GetViewSpaceDepth(fDeviceDepth);
|
336 |
+
|
337 |
+
const float2 fNdcPos = ComputeNdc(iViewportPos, iViewportSize);
|
338 |
+
const float X = fDeviceToViewDepth[2] * fNdcPos.x * Z;
|
339 |
+
const float Y = fDeviceToViewDepth[3] * fNdcPos.y * Z;
|
340 |
+
|
341 |
+
return float3(X, Y, Z);
|
342 |
+
}
|
343 |
+
|
344 |
+
|
345 |
+
struct BilinearSamplingData
|
346 |
+
{
|
347 |
+
int32_t2 iOffsets[4];
|
348 |
+
float fWeights[4];
|
349 |
+
int32_t2 iBasePos;
|
350 |
+
float2 fQuadCenterUv;
|
351 |
+
};
|
352 |
+
|
353 |
+
|
354 |
+
BilinearSamplingData GetBilinearSamplingData(float2 fUv, int32_t2 iSize)
|
355 |
+
{
|
356 |
+
/*
|
357 |
+
Closely based on:
|
358 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
359 |
+
38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_common.h#L548
|
360 |
+
*/
|
361 |
+
|
362 |
+
BilinearSamplingData data;
|
363 |
+
|
364 |
+
float2 fPxSample = (fUv * iSize) - float2(0.5f, 0.5f);
|
365 |
+
data.iBasePos = int32_t2(floor(fPxSample));
|
366 |
+
data.fQuadCenterUv = (fPxSample + 0.5f) / float2(iSize);
|
367 |
+
float2 fPxFrac = fract(fPxSample);
|
368 |
+
|
369 |
+
data.iOffsets[0] = int32_t2(0, 0);
|
370 |
+
data.iOffsets[2] = int32_t2(1, 0);
|
371 |
+
data.iOffsets[1] = int32_t2(0, 1);
|
372 |
+
data.iOffsets[3] = int32_t2(1, 1);
|
373 |
+
|
374 |
+
data.fWeights[0] = (1.f - fPxFrac.x) * (1.f - fPxFrac.y);
|
375 |
+
data.fWeights[1] = (fPxFrac.x) * (1.f - fPxFrac.y);
|
376 |
+
data.fWeights[2] = (1.f - fPxFrac.x) * (fPxFrac.y);
|
377 |
+
data.fWeights[3] = (fPxFrac.x) * (fPxFrac.y);
|
378 |
+
|
379 |
+
return data;
|
380 |
+
}
|
381 |
+
|
382 |
+
|
383 |
+
float ComputeDepthClip(float2 fUvSample, float fCurrentDepthSample)
|
384 |
+
{
|
385 |
+
/*
|
386 |
+
Closely based on:
|
387 |
+
https://github.com/arm/accuracy-super-resolution-generic-library/blob/
|
388 |
+
38697a58a6e7818ec9d28774bc073f537abb9178/include/gpu/fsr2/ffxm_fsr2_depth_clip.h#L36
|
389 |
+
*/
|
390 |
+
|
391 |
+
const float fReconstructedDepthBilinearWeightThreshold = 0.1f;
|
392 |
+
float fCurrentDepthViewSpace = GetViewSpaceDepth(fCurrentDepthSample);
|
393 |
+
BilinearSamplingData bilinearInfo = GetBilinearSamplingData(fUvSample, RenderSize());
|
394 |
+
|
395 |
+
float fDepth = 0.0f;
|
396 |
+
float fWeightSum = 0.0f;
|
397 |
+
|
398 |
+
float4 fPrevDepthSamples;
|
399 |
+
GatherReconstructedPreviousDepthRQuad(bilinearInfo.fQuadCenterUv, fPrevDepthSamples);
|
400 |
+
|
401 |
+
|
402 |
+
|
403 |
+
for (int32_t iSampleIndex = 0; iSampleIndex < 4; iSampleIndex++)
|
404 |
+
{
|
405 |
+
const int32_t2 iOffset = bilinearInfo.iOffsets[iSampleIndex];
|
406 |
+
const int32_t2 iSamplePos = bilinearInfo.iBasePos + iOffset;
|
407 |
+
|
408 |
+
const float fWeight = bilinearInfo.fWeights[iSampleIndex];
|
409 |
+
const bool onscreen = IsOnScreen(iSamplePos, RenderSize());
|
410 |
+
fWeightSum += onscreen ? 0.f : fWeight;
|
411 |
+
if (onscreen)
|
412 |
+
{
|
413 |
+
if (fWeight > fReconstructedDepthBilinearWeightThreshold)
|
414 |
+
{
|
415 |
+
const float fPrevDepthSample = fPrevDepthSamples[iSampleIndex];
|
416 |
+
const float fPrevNearestDepthViewSpace = GetViewSpaceDepth(fPrevDepthSample);
|
417 |
+
const float fDepthDiff = fCurrentDepthViewSpace - fPrevNearestDepthViewSpace;
|
418 |
+
|
419 |
+
if (fDepthDiff > 0.0f) {
|
420 |
+
|
421 |
+
#ifdef INVERTED_DEPTH
|
422 |
+
const float fPlaneDepth = min(fPrevDepthSample, fCurrentDepthSample);
|
423 |
+
#else
|
424 |
+
const float fPlaneDepth = max(fPrevDepthSample, fCurrentDepthSample);
|
425 |
+
#endif
|
426 |
+
|
427 |
+
const float3 fCenter = GetViewSpacePosition(int32_t2(RenderSize() * 0.5f), RenderSize(), fPlaneDepth);
|
428 |
+
const float3 fCorner = GetViewSpacePosition(int32_t2(0, 0), RenderSize(), fPlaneDepth);
|
429 |
+
|
430 |
+
const float fHalfViewportWidth = length(float2(RenderSize()));
|
431 |
+
const float fDepthThreshold = max(fCurrentDepthViewSpace, fPrevNearestDepthViewSpace);
|
432 |
+
|
433 |
+
const float Ksep = 1.37e-05f;
|
434 |
+
const float Kfov = length(fCorner) / length(fCenter);
|
435 |
+
const float fRequiredDepthSeparation = Ksep * Kfov * fHalfViewportWidth * fDepthThreshold;
|
436 |
+
|
437 |
+
const float fResolutionFactor = saturate(length(float2(RenderSize())) / length(float2(1920.0f, 1080.0f)));
|
438 |
+
const float fPower = lerp(1.0f, 3.0f, fResolutionFactor);
|
439 |
+
fDepth += pow(saturate(float(fRequiredDepthSeparation / fDepthDiff)), fPower) * fWeight;
|
440 |
+
fWeightSum += fWeight;
|
441 |
+
}
|
442 |
+
}
|
443 |
+
}
|
444 |
+
}
|
445 |
+
|
446 |
+
return (fWeightSum > 0) ? saturate(1.0f - fDepth / fWeightSum) : 0.0f;
|
447 |
+
}
|
448 |
+
|
449 |
+
|
450 |
+
void WriteLumaDerivative(int32_t2 pixel, half2 derivative)
|
451 |
+
{
|
452 |
+
imageStore(_PreProcessLumaDerivOut, pixel, half4(derivative, half2(0.f, 1.f)));
|
453 |
+
}
|
454 |
+
|
455 |
+
|
456 |
+
void WriteNearestDepthOffset(int32_t2 pixel, uint8_t offset)
|
457 |
+
{
|
458 |
+
half enc_norm = half(offset) / 255.HF;
|
459 |
+
imageStore(_NearestDepthCoordOut, pixel, half4(enc_norm, 0.HF, 0.HF, 1.HF));
|
460 |
+
}
|
461 |
+
|
462 |
+
|
463 |
+
void WriteToTensor(int32_t2 outputPixel, half3 input_colour, half3 history, half disocclusion_mask, half luma_derivative, half4 temporal_feedback)
|
464 |
+
{
|
465 |
+
TensorElement te;
|
466 |
+
te.wh_rgb_col_r = Quantize(half4(history.rgb, input_colour.r), _InputQuantParams);
|
467 |
+
te.col_gb_dm_fback_r = Quantize(half4(input_colour.gb, disocclusion_mask, temporal_feedback.r), _InputQuantParams);
|
468 |
+
te.fback_gba_ld = Quantize(half4(temporal_feedback.gba, luma_derivative), _InputQuantParams);
|
469 |
+
|
470 |
+
int8_t t0[12] =
|
471 |
+
{
|
472 |
+
te.wh_rgb_col_r.x,
|
473 |
+
te.wh_rgb_col_r.y,
|
474 |
+
te.wh_rgb_col_r.z,
|
475 |
+
te.wh_rgb_col_r.w,
|
476 |
+
te.col_gb_dm_fback_r.x,
|
477 |
+
te.col_gb_dm_fback_r.y,
|
478 |
+
te.col_gb_dm_fback_r.z,
|
479 |
+
te.col_gb_dm_fback_r.w,
|
480 |
+
te.fback_gba_ld.x,
|
481 |
+
te.fback_gba_ld.y,
|
482 |
+
te.fback_gba_ld.z,
|
483 |
+
te.fback_gba_ld.w
|
484 |
+
};
|
485 |
+
tensorWriteARM(_PreprocessTensor, uint[](0, outputPixel.y, outputPixel.x, 0), t0);
|
486 |
+
}
|
487 |
+
|
488 |
+
|
489 |
+
// entry-point
|
490 |
+
layout(local_size_x = 16, local_size_y = 16) in;
|
491 |
+
void main()
|
492 |
+
{
|
493 |
+
int32_t2 input_pixel = int32_t2(gl_GlobalInvocationID.xy);
|
494 |
+
if (any(greaterThanEqual(input_pixel, _InputDims))) return;
|
495 |
+
|
496 |
+
float2 uv = (float2(input_pixel) + 0.5f) * _InvInputDims;
|
497 |
+
|
498 |
+
//-------------------------------------------------------------------------
|
499 |
+
// 1) Dilate depth, find nearest pixel coordinate
|
500 |
+
//-------------------------------------------------------------------------
|
501 |
+
float depth_dilated = float(0.f);
|
502 |
+
int32_t2 nearest_pixel_offset = int32_t2(0);
|
503 |
+
FindNearestDepth(input_pixel, RenderSize(), depth_dilated, nearest_pixel_offset);
|
504 |
+
|
505 |
+
//-------------------------------------------------------------------------
|
506 |
+
// 2) Load motion vectors
|
507 |
+
//-------------------------------------------------------------------------
|
508 |
+
half2 motion = LoadMotion(input_pixel + nearest_pixel_offset);
|
509 |
+
|
510 |
+
// Suppress very small motion - no value in resampling here
|
511 |
+
half2 motion_pix = motion * half2(RenderSize());
|
512 |
+
motion *= half(dot(motion_pix, motion_pix) > _MotionWarpThresh);
|
513 |
+
|
514 |
+
// Calculate sample position(s) for everything in `tm1` frame
|
515 |
+
float2 reproj_uv = uv - float2(motion);
|
516 |
+
float2 unjitter_tm1_uv = reproj_uv - _JitterOffsetTm1Uv;
|
517 |
+
|
518 |
+
//-------------------------------------------------------------------------
|
519 |
+
// 3) Calculate depth-based disocclusion mask
|
520 |
+
//-------------------------------------------------------------------------
|
521 |
+
half disocclusion_mask = half(ComputeDepthClip(unjitter_tm1_uv, depth_dilated));
|
522 |
+
|
523 |
+
// Scale disocclusion mask on static frames to let network know this is happening under
|
524 |
+
// static conditions, reduces jitter differences across frames causing false flags
|
525 |
+
half dm_scale = dot(motion_pix, motion_pix) > _MotionDisThresh ? half(1.0f) : _DisocclusionScale;
|
526 |
+
disocclusion_mask = disocclusion_mask * dm_scale;
|
527 |
+
|
528 |
+
//-------------------------------------------------------------------------
|
529 |
+
// 4) Downsample + warp history buffer
|
530 |
+
//-------------------------------------------------------------------------
|
531 |
+
half3 warped_history = WarpHistory(reproj_uv);
|
532 |
+
|
533 |
+
//-------------------------------------------------------------------------
|
534 |
+
// 5) Read current low-res / jittered / aliased colour
|
535 |
+
//-------------------------------------------------------------------------
|
536 |
+
half3 jittered_colour = LoadColour(input_pixel);
|
537 |
+
|
538 |
+
//-------------------------------------------------------------------------
|
539 |
+
// 6) Calculate derivative of `luma`
|
540 |
+
// helps identifying high-frequency flicker due to jitter
|
541 |
+
//-------------------------------------------------------------------------
|
542 |
+
half2 luma_derivative = CalculateLumaDerivative(reproj_uv, jittered_colour, disocclusion_mask);
|
543 |
+
|
544 |
+
//-------------------------------------------------------------------------
|
545 |
+
// 7) Warp temporal feedback
|
546 |
+
//-------------------------------------------------------------------------
|
547 |
+
half4 temporal_feedback = WarpFeedback(reproj_uv);
|
548 |
+
|
549 |
+
//-------------------------------------------------------------------------
|
550 |
+
// 8) Convert dilated depth coord to a position offset
|
551 |
+
//-------------------------------------------------------------------------
|
552 |
+
uint8_t enc_depth_offset = EncodeNearestDepthCoord(nearest_pixel_offset);
|
553 |
+
|
554 |
+
//-------------------------------------------------------------------------
|
555 |
+
// 9) Write Outputs
|
556 |
+
//-------------------------------------------------------------------------
|
557 |
+
// Consumed by NE
|
558 |
+
WriteToTensor(
|
559 |
+
input_pixel,
|
560 |
+
jittered_colour, // 3ch
|
561 |
+
warped_history, // 3ch
|
562 |
+
disocclusion_mask, // 1ch
|
563 |
+
luma_derivative.x, // 1ch
|
564 |
+
temporal_feedback // 4ch
|
565 |
+
); // total: 12ch
|
566 |
+
|
567 |
+
// Consumed by post process and frame t+1
|
568 |
+
WriteNearestDepthOffset(input_pixel, enc_depth_offset);
|
569 |
+
|
570 |
+
// Consumed at frame t+1
|
571 |
+
WriteLumaDerivative(input_pixel, luma_derivative);
|
572 |
+
}
|
scenario/0_pre_process.spv
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b03bcb283b73870daa0a540cfb8f1e8ec9c4842b38a711f52d31517569e79b87
|
3 |
+
size 29476
|
scenario/0_pre_process_push_consts.npy
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6319b912dd9ee3e1ce44794067ea57fd9eb01ff0e38b3f8a55ceea7be18e6412
|
3 |
+
size 256
|
scenario/1_nss.vgf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fd2a1bd13f156fcfa7a0cf132220ca39c2c2498f4af2c7c7da10a42ef4e555a7
|
3 |
+
size 163860
|
scenario/2_post_process.comp
ADDED
@@ -0,0 +1,361 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
//
|
2 |
+
// -----------------------------------------------------------------------------
|
3 |
+
// The proprietary software and information contained in this file is
|
4 |
+
// confidential and may only be used by an authorized person under a valid
|
5 |
+
// licensing agreement from Arm Limited or its affiliates.
|
6 |
+
//
|
7 |
+
// Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
|
8 |
+
//
|
9 |
+
// This entire notice must be reproduced on all copies of this file and
|
10 |
+
// copies of this file may only be made by an authorized person under a valid
|
11 |
+
// licensing agreement from Arm Limited or its affiliates.
|
12 |
+
// -----------------------------------------------------------------------------
|
13 |
+
//
|
14 |
+
#version 460
|
15 |
+
#extension GL_EXT_shader_8bit_storage : require
|
16 |
+
#extension GL_EXT_shader_16bit_storage : require
|
17 |
+
#extension GL_EXT_shader_explicit_arithmetic_types : require
|
18 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_int8 : require
|
19 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : require
|
20 |
+
#extension GL_EXT_shader_explicit_arithmetic_types_float32 : require
|
21 |
+
#extension GL_GOOGLE_include_directive : enable
|
22 |
+
|
23 |
+
// defines
|
24 |
+
#define SCALE_1_0X 0
|
25 |
+
#define SCALE_1_3X 1
|
26 |
+
#define SCALE_1_5X 2
|
27 |
+
#define SCALE_2_0X 3
|
28 |
+
|
29 |
+
// settings
|
30 |
+
#define HISTORY_CATMULL
|
31 |
+
#define SCALE_MODE SCALE_2_0X
|
32 |
+
|
33 |
+
// includes
|
34 |
+
#include "typedefs.h"
|
35 |
+
#include "common.h"
|
36 |
+
#include "kernel_lut.h"
|
37 |
+
|
38 |
+
// inputs
|
39 |
+
layout (set=0, binding=0) uniform mediump sampler2D _ColourTex; // 540p | R11G11B10 32bpp
|
40 |
+
layout (set=0, binding=1) uniform mediump sampler2D _MotionVectorTex; // 540p | RG16_FLOAT 32bpp
|
41 |
+
layout (set=0, binding=2) uniform mediump sampler2D _HistoryTex; // 1080p | R11G11B10 32bpp
|
42 |
+
layout (set=0, binding=3) uniform lowp sampler2D _K0Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
43 |
+
layout (set=0, binding=4) uniform lowp sampler2D _K1Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
44 |
+
layout (set=0, binding=5) uniform lowp sampler2D _K2Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
45 |
+
layout (set=0, binding=6) uniform lowp sampler2D _K3Tensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
46 |
+
layout (set=0, binding=7) uniform lowp sampler2D _TemporalTensor; // 540p | R8G8B8A8_SNORM 32bpp | Tensor->Texture Alias (Linear)
|
47 |
+
layout (set=0, binding=8) uniform lowp sampler2D _NearestDepthCoordTex; // 540p | R8_UNORM 8bpp
|
48 |
+
|
49 |
+
// outputs
|
50 |
+
layout (set=1, binding=0, r11f_g11f_b10f) uniform writeonly mediump image2D _UpsampledColourOut; // 1080p | R11G11B10 32bpp
|
51 |
+
|
52 |
+
// push-constants
|
53 |
+
layout(push_constant, std430) uniform PushConstants {
|
54 |
+
// ─────────────── 8-byte aligned ───────────────
|
55 |
+
layout(offset = 0) int32_t2 _OutputDims; // 8 B
|
56 |
+
layout(offset = 8) int32_t2 _InputDims; // 8 B
|
57 |
+
layout(offset = 16) float2 _InvOutputDims; // 8 B
|
58 |
+
layout(offset = 24) float2 _InvInputDims; // 8 B
|
59 |
+
layout(offset = 32) float2 _Scale; // 8 B
|
60 |
+
layout(offset = 40) float2 _InvScale; // 8 B
|
61 |
+
|
62 |
+
// ─────────────── 4-byte aligned ───────────────
|
63 |
+
layout(offset = 48) int16_t2 _IndexModulo; // 4 B
|
64 |
+
layout(offset = 52) half2 _QuantParams; // 4 B
|
65 |
+
layout(offset = 56) int16_t2 _LutOffset; // 4 B
|
66 |
+
layout(offset = 60) half2 _ExposurePair; // 4 B
|
67 |
+
layout(offset = 64) half2 _HistoryPad; // 4 B
|
68 |
+
layout(offset = 68) half2 _MotionThreshPad; // 4 B (.x = motion, .y = unused)
|
69 |
+
layout(offset = 72) int32_t _Padding0; // 4 B (explicit pad for alignment)
|
70 |
+
// Total: **76 bytes**
|
71 |
+
};
|
72 |
+
|
73 |
+
// Convenience mapping for accessing push constants
|
74 |
+
#define _Exposure _ExposurePair.x
|
75 |
+
#define _InvExposure _ExposurePair.y
|
76 |
+
#define _NotHistoryReset _HistoryPad.x
|
77 |
+
#define _MotionThresh _MotionThreshPad.x
|
78 |
+
|
79 |
+
// Quantization Parameters
|
80 |
+
// inside: `./parameters.json`
|
81 |
+
// these values are embdedded inside the TOSA file and learnt during QAT
|
82 |
+
|
83 |
+
#ifndef _K0QuantParams
|
84 |
+
// outputs - activation_post_process_45["SNORM"]
|
85 |
+
#define _K0QuantParams _QuantParams.xy
|
86 |
+
#endif
|
87 |
+
#ifndef _K1QuantParams
|
88 |
+
// outputs - activation_post_process_50["SNORM"]
|
89 |
+
#define _K1QuantParams _QuantParams.xy
|
90 |
+
#endif
|
91 |
+
#ifndef _K2QuantParams
|
92 |
+
// outputs - activation_post_process_55["SNORM"]
|
93 |
+
#define _K2QuantParams _QuantParams.xy
|
94 |
+
#endif
|
95 |
+
#ifndef _K3QuantParams
|
96 |
+
// outputs - activation_post_process_60["SNORM"]
|
97 |
+
#define _K3QuantParams _QuantParams.xy
|
98 |
+
#endif
|
99 |
+
#ifndef _TemporalQuantParams
|
100 |
+
// outputs - activation_post_process_65["SNORM"]
|
101 |
+
#define _TemporalQuantParams _QuantParams.xy
|
102 |
+
#endif
|
103 |
+
|
104 |
+
|
105 |
+
// methods
|
106 |
+
|
107 |
+
half2 LoadMotion(int32_t2 pixel)
|
108 |
+
{
|
109 |
+
return half2(texelFetch(_MotionVectorTex, pixel, 0).rg);
|
110 |
+
}
|
111 |
+
|
112 |
+
|
113 |
+
half3 LoadHistory(float2 uv)
|
114 |
+
{
|
115 |
+
return half3(textureLod(_HistoryTex, uv, 0).rgb);
|
116 |
+
}
|
117 |
+
|
118 |
+
half3 LoadHistoryCatmull(float2 uv)
|
119 |
+
{
|
120 |
+
//------------------------------------------------------------------------------------
|
121 |
+
// 1) Compute Catmull–Rom weights
|
122 |
+
//------------------------------------------------------------------------------------
|
123 |
+
float2 scaledUV = uv * _OutputDims;
|
124 |
+
float2 baseFloor = floor(scaledUV - 0.5) + 0.5;
|
125 |
+
|
126 |
+
half2 f = half2(scaledUV - baseFloor);
|
127 |
+
half2 f2 = f * f;
|
128 |
+
half2 f3 = f2 * f;
|
129 |
+
|
130 |
+
// Catmull–Rom basis
|
131 |
+
half2 w0 = f2 - 0.5HF * (f3 + f);
|
132 |
+
half2 w1 = 1.5HF * f3 - 2.5HF * f2 + 1.0HF;
|
133 |
+
half2 w3 = 0.5HF * (f3 - f2);
|
134 |
+
half2 w2 = (1.0HF - w0) - w1 - w3; // = 1 - (w0 + w1 + w3)
|
135 |
+
|
136 |
+
// Combine w1 and w2 for center axis
|
137 |
+
half2 w12 = w1 + w2;
|
138 |
+
half wx0 = w0.x, wy0 = w0.y;
|
139 |
+
half wx1 = w12.x, wy1 = w12.y;
|
140 |
+
half wx2 = w3.x, wy2 = w3.y;
|
141 |
+
|
142 |
+
// Final weights for the cross sample layout
|
143 |
+
half wUp = wx1 * wy0; // center in X, up in Y
|
144 |
+
half wDown = wx1 * wy2; // center in X, down in Y
|
145 |
+
half wLeft = wx0 * wy1; // left in X, center in Y
|
146 |
+
half wRight = wx2 * wy1; // right in X, center in Y
|
147 |
+
half wCenter = wx1 * wy1; // center in X, center in Y
|
148 |
+
|
149 |
+
// Fractional offsets for the center
|
150 |
+
half dx = w2.x / wx1;
|
151 |
+
half dy = w2.y / wy1;
|
152 |
+
|
153 |
+
//------------------------------------------------------------------------------------
|
154 |
+
// 2) Gather the 5 taps
|
155 |
+
//------------------------------------------------------------------------------------
|
156 |
+
half4 left = half4(LoadHistory((baseFloor + float2(-1.0, dy)) * _InvOutputDims ), 1.HF);
|
157 |
+
half4 up = half4(LoadHistory((baseFloor + float2(dx, -1.0)) * _InvOutputDims ), 1.HF);
|
158 |
+
half4 center = half4(LoadHistory((baseFloor + float2(dx, dy)) * _InvOutputDims ), 1.HF);
|
159 |
+
half4 right = half4(LoadHistory((baseFloor + float2(2.0, dy)) * _InvOutputDims ), 1.HF);
|
160 |
+
half4 down = half4(LoadHistory((baseFloor + float2(dx, 2.0)) * _InvOutputDims ), 1.HF);
|
161 |
+
|
162 |
+
//------------------------------------------------------------------------------------
|
163 |
+
// 3) Accumulate and track min/max
|
164 |
+
//------------------------------------------------------------------------------------
|
165 |
+
half4 accum = up * wUp +
|
166 |
+
left * wLeft +
|
167 |
+
center* wCenter +
|
168 |
+
right * wRight +
|
169 |
+
down * wDown;
|
170 |
+
half3 cmin3 = min(up.rgb,
|
171 |
+
min(left.rgb,
|
172 |
+
min(center.rgb,
|
173 |
+
min(right.rgb, down.rgb))));
|
174 |
+
half3 cmax3 = max(up.rgb,
|
175 |
+
max(left.rgb,
|
176 |
+
max(center.rgb,
|
177 |
+
max(right.rgb, down.rgb))));
|
178 |
+
|
179 |
+
//------------------------------------------------------------------------------------
|
180 |
+
// 4) Final color
|
181 |
+
//------------------------------------------------------------------------------------
|
182 |
+
half3 color = accum.rgb * rcp(accum.w);
|
183 |
+
|
184 |
+
// dering in the case where we have negative values, we don't do this all the time
|
185 |
+
// as it can impose unnecessary blurring on the output
|
186 |
+
return any(lessThan(color, half3(0.HF)))
|
187 |
+
? clamp(color, cmin3, cmax3)
|
188 |
+
: color;
|
189 |
+
}
|
190 |
+
|
191 |
+
|
192 |
+
int32_t2 LoadNearestDepthOffset(int32_t2 pixel)
|
193 |
+
{
|
194 |
+
half encNorm = half(texelFetch(_NearestDepthCoordTex, pixel, 0).r);
|
195 |
+
int32_t code = int32_t(encNorm * 255.0 + 0.5);
|
196 |
+
|
197 |
+
// 3. map back to {-1,0,1}²
|
198 |
+
return DecodeNearestDepthCoord(code);
|
199 |
+
}
|
200 |
+
|
201 |
+
|
202 |
+
half3 LoadWarpedHistory(float2 uv, int32_t2 input_pixel, out half onscreen)
|
203 |
+
{
|
204 |
+
// Dilate motion vectors with previously calculated nearest depth coordinate
|
205 |
+
int32_t2 nearest_offset = LoadNearestDepthOffset(input_pixel);
|
206 |
+
half2 motion = LoadMotion(input_pixel + nearest_offset);
|
207 |
+
|
208 |
+
// Suppress very small motion - no need to resample
|
209 |
+
half2 motion_pix = motion * half2(_OutputDims);
|
210 |
+
motion *= half(dot(motion_pix, motion_pix) > _MotionThresh);
|
211 |
+
|
212 |
+
// UV coordinates in previous frame to resample history
|
213 |
+
float2 reproj_uv = uv - float2(motion);
|
214 |
+
|
215 |
+
// Mask to flag whether the motion vector is resampling from valid location onscreen
|
216 |
+
onscreen = half(
|
217 |
+
all(greaterThanEqual(reproj_uv, float2(0.0))) &&
|
218 |
+
all(lessThan(reproj_uv, float2(1.0)))
|
219 |
+
);
|
220 |
+
|
221 |
+
#ifdef HISTORY_CATMULL
|
222 |
+
half3 warped_history = LoadHistoryCatmull(reproj_uv);
|
223 |
+
#else
|
224 |
+
half3 warped_history = LoadHistory(reproj_uv);
|
225 |
+
#endif
|
226 |
+
|
227 |
+
return SafeColour(warped_history * _Exposure);
|
228 |
+
}
|
229 |
+
|
230 |
+
#if SCALE_MODE == SCALE_2_0X
|
231 |
+
/*
|
232 |
+
Optimised special case pattern for applying 4x4 kernel to
|
233 |
+
sparse jitter-aware 2x2 upsampled image
|
234 |
+
*/
|
235 |
+
|
236 |
+
|
237 |
+
half4 LoadKPNWeight(float2 uv, int16_t lut_idx)
|
238 |
+
{
|
239 |
+
// Load 4 kernel slices (each with 4 taps)
|
240 |
+
half4 k0 = Dequantize(half4(textureLod(_K0Tensor, uv, 0)), _K0QuantParams);
|
241 |
+
half4 k1 = Dequantize(half4(textureLod(_K1Tensor, uv, 0)), _K1QuantParams);
|
242 |
+
half4 k2 = Dequantize(half4(textureLod(_K2Tensor, uv, 0)), _K2QuantParams);
|
243 |
+
half4 k3 = Dequantize(half4(textureLod(_K3Tensor, uv, 0)), _K3QuantParams);
|
244 |
+
|
245 |
+
// Precomputed swizzle patterns for KernelTile
|
246 |
+
half4 p0 = half4(k0.x, k2.x, k0.z, k2.z);
|
247 |
+
half4 p1 = half4(k1.x, k3.x, k1.z, k3.z);
|
248 |
+
half4 p2 = half4(k0.y, k2.y, k0.w, k2.w);
|
249 |
+
half4 p3 = half4(k1.y, k3.y, k1.w, k3.w);
|
250 |
+
|
251 |
+
// Return the correct pattern for this tile
|
252 |
+
return (lut_idx == 0) ? p0 :
|
253 |
+
(lut_idx == 1) ? p1 :
|
254 |
+
(lut_idx == 2) ? p2 :
|
255 |
+
p3;
|
256 |
+
}
|
257 |
+
|
258 |
+
|
259 |
+
half3 LoadAndFilterColour(int32_t2 output_pixel, float2 uv, out half4 col_to_accum)
|
260 |
+
{
|
261 |
+
//-------------------------------------------------------------------
|
262 |
+
// 1. Compute indexes, load correct pattern from LUT for given thread
|
263 |
+
//-------------------------------------------------------------------
|
264 |
+
float2 out_tex = float2(output_pixel) + 0.5f;
|
265 |
+
|
266 |
+
// Compute the LUT index for this pixel
|
267 |
+
int16_t2 tiled_idx = (int16_t2(output_pixel) + _LutOffset) % int16_t2(_IndexModulo);
|
268 |
+
int16_t lut_idx = tiled_idx.y * int16_t(_IndexModulo) + tiled_idx.x;
|
269 |
+
KernelTile lut = kernelLUT[lut_idx];
|
270 |
+
|
271 |
+
//------------------------------------------------------------------
|
272 |
+
// 2. Apply KPN
|
273 |
+
//------------------------------------------------------------------
|
274 |
+
// Dequantize the kernel weights
|
275 |
+
half4 kpn_weights = clamp(LoadKPNWeight(uv, lut_idx), half4(EPS), half4(1.HF));
|
276 |
+
|
277 |
+
// Calculate tap locations
|
278 |
+
int16_t4 tap_x = clamp(int16_t4(floor((float4(out_tex.x) + float4(lut.dx)) * _InvScale.x)), int16_t4(0), int16_t4(_InputDims.x - 1));
|
279 |
+
int16_t4 tap_y = clamp(int16_t4(floor((float4(out_tex.y) + float4(lut.dy)) * _InvScale.y)), int16_t4(0), int16_t4(_InputDims.y - 1));
|
280 |
+
|
281 |
+
// Gather taps
|
282 |
+
f16mat4x4 interm;
|
283 |
+
interm[0] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[0], tap_y[0]), 0).rgb) * half3(_Exposure)), 1.HF);
|
284 |
+
interm[1] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[1], tap_y[1]), 0).rgb) * half3(_Exposure)), 1.HF);
|
285 |
+
interm[2] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[2], tap_y[2]), 0).rgb) * half3(_Exposure)), 1.HF);
|
286 |
+
interm[3] = half4(SafeColour(half3(texelFetch(_ColourTex, int16_t2(tap_x[3], tap_y[3]), 0).rgb) * half3(_Exposure)), 1.HF);
|
287 |
+
|
288 |
+
// Special case: grab the accumulation pixel, when it corresponds to current thread
|
289 |
+
half match = half(lut.dx[CENTER_TAP] == 0 && lut.dy[CENTER_TAP] == 0);
|
290 |
+
col_to_accum = interm[CENTER_TAP] * match;
|
291 |
+
|
292 |
+
// Apply filter
|
293 |
+
half4 out_colour = interm * kpn_weights;
|
294 |
+
|
295 |
+
return half3(out_colour.rgb * rcp(out_colour.w));
|
296 |
+
}
|
297 |
+
#else
|
298 |
+
#error "Unsupported SCALE_MODE"
|
299 |
+
#endif // SCALE_MODE == SCALE_2_0X
|
300 |
+
|
301 |
+
|
302 |
+
void LoadTemporalParameters(float2 uv, out half theta, out half alpha)
|
303 |
+
{
|
304 |
+
half2 tp = Dequantize(half2(textureLod(_TemporalTensor, uv, 0).xy), _TemporalQuantParams);
|
305 |
+
theta = tp.x * _NotHistoryReset; // {0 <= x <= 1}
|
306 |
+
alpha = tp.y * 0.35HF + 0.05HF; // { 0.05 <= x <= 0.4}
|
307 |
+
}
|
308 |
+
|
309 |
+
|
310 |
+
void WriteUpsampledColour(int32_t2 pixel, half3 colour)
|
311 |
+
{
|
312 |
+
half3 to_write = SafeColour(colour);
|
313 |
+
// Write with alpha = 1.0
|
314 |
+
imageStore(_UpsampledColourOut, pixel, half4(to_write, 1.0));
|
315 |
+
}
|
316 |
+
|
317 |
+
|
318 |
+
// entry-point
|
319 |
+
layout(local_size_x = 16, local_size_y = 16) in;
|
320 |
+
void main()
|
321 |
+
{
|
322 |
+
int32_t2 output_pixel = int32_t2(gl_GlobalInvocationID.xy);
|
323 |
+
if (any(greaterThanEqual(output_pixel, _OutputDims))) return;
|
324 |
+
|
325 |
+
float2 uv = (float2(output_pixel) + 0.5) * _InvOutputDims;
|
326 |
+
int32_t2 input_pixel = int32_t2(uv * _InputDims);
|
327 |
+
|
328 |
+
//-------------------------------------------------------------------------
|
329 |
+
// 1) Warp history
|
330 |
+
//-------------------------------------------------------------------------
|
331 |
+
half onscreen;
|
332 |
+
half3 history = LoadWarpedHistory(uv, input_pixel, onscreen);
|
333 |
+
|
334 |
+
//-------------------------------------------------------------------------
|
335 |
+
// 2) KPN filter → col
|
336 |
+
//-------------------------------------------------------------------------
|
337 |
+
half4 col_to_accum;
|
338 |
+
half3 colour = LoadAndFilterColour(output_pixel, uv, col_to_accum);
|
339 |
+
|
340 |
+
// -------------------------------------------------------------------------
|
341 |
+
// 3) Load temporal parameters
|
342 |
+
//-------------------------------------------------------------------------
|
343 |
+
half theta, alpha;
|
344 |
+
LoadTemporalParameters(uv, theta, alpha);
|
345 |
+
|
346 |
+
//-------------------------------------------------------------------------
|
347 |
+
// 3) Rectify history, force reset when offscreen
|
348 |
+
//-------------------------------------------------------------------------
|
349 |
+
half3 rectified = lerp(colour, history, theta * onscreen);
|
350 |
+
|
351 |
+
//-------------------------------------------------------------------------
|
352 |
+
// 3) Accumulate new sample
|
353 |
+
//-------------------------------------------------------------------------
|
354 |
+
half3 accumulated = lerp(Tonemap(rectified), Tonemap(col_to_accum.rgb), alpha * col_to_accum.a);
|
355 |
+
|
356 |
+
//-------------------------------------------------------------------------
|
357 |
+
// 4) Inverse tonemap + exposure and write output
|
358 |
+
//-------------------------------------------------------------------------
|
359 |
+
half3 out_linear = InverseTonemap(accumulated) * _InvExposure;
|
360 |
+
WriteUpsampledColour(output_pixel, out_linear);
|
361 |
+
}
|
scenario/2_post_process.spv
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d15c811db716f90606bb42710f5093bfd3dcfedb674ab7223b27909d8c3467a5
|
3 |
+
size 25780
|
scenario/2_post_process_push_consts.npy
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4fe5029783bc6bb2adaa1f8bafc9ab8fe73340bb0a3055d18f80ca3e6a99862a
|
3 |
+
size 204
|
scenario/common.h
ADDED
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
//
|
2 |
+
// -----------------------------------------------------------------------------
|
3 |
+
// The proprietary software and information contained in this file is
|
4 |
+
// confidential and may only be used by an authorized person under a valid
|
5 |
+
// licensing agreement from Arm Limited or its affiliates.
|
6 |
+
//
|
7 |
+
// Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
|
8 |
+
//
|
9 |
+
// This entire notice must be reproduced on all copies of this file and
|
10 |
+
// copies of this file may only be made by an authorized person under a valid
|
11 |
+
// licensing agreement from Arm Limited or its affiliates.
|
12 |
+
// -----------------------------------------------------------------------------
|
13 |
+
//
|
14 |
+
#ifndef NSS_COMMON
|
15 |
+
#define NSS_COMMON
|
16 |
+
|
17 |
+
#include "typedefs.h"
|
18 |
+
|
19 |
+
#define MAX_FP16 65504.HF
|
20 |
+
#define EPS 1e-7HF
|
21 |
+
|
22 |
+
|
23 |
+
// Activation Functions
|
24 |
+
// ──────────────────────────────────────────────────────────────────────────────────────────
|
25 |
+
|
26 |
+
|
27 |
+
half Sigmoid(half x)
|
28 |
+
{
|
29 |
+
return rcp(half(1.0) + exp(-x));
|
30 |
+
}
|
31 |
+
|
32 |
+
|
33 |
+
half2 Sigmoid(half2 x)
|
34 |
+
{
|
35 |
+
return rcp(half2(1.0) + exp(-x));
|
36 |
+
}
|
37 |
+
|
38 |
+
|
39 |
+
half3 Sigmoid(half3 x)
|
40 |
+
{
|
41 |
+
return rcp(half3(1.0) + exp(-x));
|
42 |
+
}
|
43 |
+
|
44 |
+
|
45 |
+
half4 Sigmoid(half4 x)
|
46 |
+
{
|
47 |
+
return rcp(half4(1.0) + exp(-x));
|
48 |
+
}
|
49 |
+
|
50 |
+
|
51 |
+
// Quantize/Dequantize
|
52 |
+
// ──────────────────────────────────────────────────────────────────────────────────────────
|
53 |
+
// all expect .x = scale, .y = zero point, quantize methods expect to receive: .x = rcp(scale)
|
54 |
+
|
55 |
+
half Dequantize(half i, half2 quant_params)
|
56 |
+
{
|
57 |
+
return (i - quant_params.y) * quant_params.x;
|
58 |
+
}
|
59 |
+
|
60 |
+
|
61 |
+
half2 Dequantize(half2 i, half2 quant_params)
|
62 |
+
{
|
63 |
+
return (i - quant_params.y) * quant_params.x;
|
64 |
+
}
|
65 |
+
|
66 |
+
|
67 |
+
half3 Dequantize(half3 i, half2 quant_params)
|
68 |
+
{
|
69 |
+
return (i - quant_params.y) * quant_params.x;
|
70 |
+
}
|
71 |
+
|
72 |
+
|
73 |
+
half4 Dequantize(half4 i, half2 quant_params)
|
74 |
+
{
|
75 |
+
return (i - quant_params.y) * quant_params.x;
|
76 |
+
}
|
77 |
+
|
78 |
+
|
79 |
+
int8_t Quantize(half f, half2 quant_params)
|
80 |
+
{
|
81 |
+
return int8_t(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
|
82 |
+
}
|
83 |
+
|
84 |
+
|
85 |
+
int8_t2 Quantize(half2 f, half2 quant_params)
|
86 |
+
{
|
87 |
+
return int8_t2(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
|
88 |
+
}
|
89 |
+
|
90 |
+
|
91 |
+
int8_t3 Quantize(half3 f, half2 quant_params)
|
92 |
+
{
|
93 |
+
return int8_t3(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
|
94 |
+
}
|
95 |
+
|
96 |
+
|
97 |
+
int8_t4 Quantize(half4 f, half2 quant_params)
|
98 |
+
{
|
99 |
+
return int8_t4(clamp(round(f * quant_params.x + quant_params.y), -128.HF, 127.HF));
|
100 |
+
}
|
101 |
+
|
102 |
+
|
103 |
+
// Encode/Decode
|
104 |
+
// ──────────────────────────────────────────────────────────────────────────────────────────
|
105 |
+
// Note: both encode/decode methods are currently bound to 3x3 windows, they should be
|
106 |
+
// expandable in future if needed. The most likely to need this would be the jitter
|
107 |
+
// encoding, where 3x3 may not be enough for larger than 3x3 scale factors.
|
108 |
+
|
109 |
+
|
110 |
+
uint8_t EncodeNearestDepthCoord(int32_t2 o)
|
111 |
+
{
|
112 |
+
// o ∈ {-1, 0, 1}²
|
113 |
+
o = clamp(o, ivec2(-1), ivec2( 1));
|
114 |
+
return uint8_t((o.y + 1) << 2 | (o.x + 1)); // 0-15
|
115 |
+
}
|
116 |
+
|
117 |
+
|
118 |
+
int32_t2 DecodeNearestDepthCoord(int32_t code)
|
119 |
+
{
|
120 |
+
int32_t x = int32_t( code & 0x3) - 1; // bits 0-1
|
121 |
+
int32_t y = int32_t((code >> 2) & 0x3) - 1; // bits 2-3
|
122 |
+
return int32_t2(x, y);
|
123 |
+
}
|
124 |
+
|
125 |
+
|
126 |
+
// Image Operations
|
127 |
+
// ──────────────────────────────────────────────────────────────────────────────────────────
|
128 |
+
|
129 |
+
half Luminance(half3 rgb)
|
130 |
+
{
|
131 |
+
// ITU-R BT.709: `0.2126 * R + 0.7152 * G + 0.0722 * B`
|
132 |
+
return dot(rgb, half3(0.2126, 0.7152, 0.0722));
|
133 |
+
}
|
134 |
+
|
135 |
+
|
136 |
+
half3 Tonemap(half3 x)
|
137 |
+
{
|
138 |
+
// Karis tonemapper
|
139 |
+
// http://graphicrants.blogspot.com/2013/12/tone-mapping.html
|
140 |
+
x = max(x, half3(0.HF));
|
141 |
+
return x * rcp(half3(1.HF) + max(max(x.r, x.g), x.b));
|
142 |
+
}
|
143 |
+
|
144 |
+
|
145 |
+
half3 InverseTonemap(half3 x)
|
146 |
+
{
|
147 |
+
// Karis tonemapper inverse
|
148 |
+
// http://graphicrants.blogspot.com/2013/12/tone-mapping.html
|
149 |
+
x = clamp(x, half3(0.HF), Tonemap(half3(MAX_FP16)));
|
150 |
+
return x * rcp(half3(1.HF) - max(max(x.r, x.g), x.b));
|
151 |
+
}
|
152 |
+
|
153 |
+
|
154 |
+
half3 SafeColour(half3 x)
|
155 |
+
{
|
156 |
+
return clamp(x, half3(0.HF), half3(MAX_FP16));
|
157 |
+
}
|
158 |
+
|
159 |
+
|
160 |
+
#endif // NSS_COMMON
|
scenario/in_colour.dds
ADDED
|
Git LFS Details
|
scenario/in_depth.dds
ADDED
|
Git LFS Details
|
scenario/in_depth_tm1.dds
ADDED
|
Git LFS Details
|
scenario/in_derivative_tm1.dds
ADDED
|
Git LFS Details
|
scenario/in_feedback_tm1.dds
ADDED
|
Git LFS Details
|
scenario/in_history.dds
ADDED
|
Git LFS Details
|
scenario/in_motion.dds
ADDED
|
Git LFS Details
|
scenario/in_nearest_offset_tm1.dds
ADDED
|
Git LFS Details
|
scenario/kernel_lut.h
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
//
|
2 |
+
// -----------------------------------------------------------------------------
|
3 |
+
// The proprietary software and information contained in this file is
|
4 |
+
// confidential and may only be used by an authorized person under a valid
|
5 |
+
// licensing agreement from Arm Limited or its affiliates.
|
6 |
+
//
|
7 |
+
// Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
|
8 |
+
//
|
9 |
+
// This entire notice must be reproduced on all copies of this file and
|
10 |
+
// copies of this file may only be made by an authorized person under a valid
|
11 |
+
// licensing agreement from Arm Limited or its affiliates.
|
12 |
+
// -----------------------------------------------------------------------------
|
13 |
+
//
|
14 |
+
#ifndef NSS_KERNEL_LUT
|
15 |
+
#define NSS_KERNEL_LUT
|
16 |
+
#include "typedefs.h"
|
17 |
+
|
18 |
+
|
19 |
+
struct KernelTile {
|
20 |
+
int16_t4 dy;
|
21 |
+
int16_t4 dx;
|
22 |
+
};
|
23 |
+
|
24 |
+
|
25 |
+
// Define actual scale value based on mode
|
26 |
+
#if SCALE_MODE == SCALE_2_0X
|
27 |
+
|
28 |
+
#define CENTER_TAP 0
|
29 |
+
#define NUM_PATTERNS 4
|
30 |
+
|
31 |
+
const KernelTile kernelLUT[NUM_PATTERNS] = {
|
32 |
+
{
|
33 |
+
// Pattern 0:
|
34 |
+
// Taps: 0, 2, 8, 10
|
35 |
+
// Grid:
|
36 |
+
// [● · ● ·]
|
37 |
+
// [· · · ·]
|
38 |
+
// [● · ● ·]
|
39 |
+
// [· · · ·]
|
40 |
+
int16_t4(-1, -1, +1, +1),
|
41 |
+
int16_t4(-1, +1, -1, +1)
|
42 |
+
},
|
43 |
+
{
|
44 |
+
// Pattern 1:
|
45 |
+
// Taps: 4, 6, 12, 14
|
46 |
+
// Grid:
|
47 |
+
// [· · · ·]
|
48 |
+
// [● · ● ·]
|
49 |
+
// [· · · ·]
|
50 |
+
// [● · ● ·]
|
51 |
+
int16_t4(-1, -1, +1, +1),
|
52 |
+
int16_t4(+0, +2, +0, +2)
|
53 |
+
},
|
54 |
+
{
|
55 |
+
// Pattern 2:
|
56 |
+
// Taps: 1, 3, 9, 11
|
57 |
+
// Grid:
|
58 |
+
// [· ● · ●]
|
59 |
+
// [· · · ·]
|
60 |
+
// [· ● · ●]
|
61 |
+
// [· · · ·]
|
62 |
+
int16_t4(+0, +0, +2, +2),
|
63 |
+
int16_t4(-1, +1, -1, +1)
|
64 |
+
},
|
65 |
+
{
|
66 |
+
// Pattern 3:
|
67 |
+
// Taps: 5, 7, 13, 15
|
68 |
+
// Grid:
|
69 |
+
// [· · · ·]
|
70 |
+
// [· ● · ●]
|
71 |
+
// [· · · ·]
|
72 |
+
// [· ● · ●]
|
73 |
+
int16_t4( 0, +0, +2, +2), // center-aligned
|
74 |
+
int16_t4( 0, +2, +0, +2)
|
75 |
+
}
|
76 |
+
};
|
77 |
+
|
78 |
+
#else
|
79 |
+
#error "Unsupported SCALE_MODE"
|
80 |
+
#endif
|
81 |
+
|
82 |
+
|
83 |
+
#endif //NSS_KERNEL_LUT
|
scenario/parameters.json
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"inputs": {
|
3 |
+
"x": {
|
4 |
+
"SINT": {
|
5 |
+
"scale": 0.003921568859368563,
|
6 |
+
"zero_point": -128
|
7 |
+
},
|
8 |
+
"SNORM": {
|
9 |
+
"scale": 0.49803924513980746,
|
10 |
+
"zero_point": -1.0078740157480315
|
11 |
+
}
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"outputs": {
|
15 |
+
"activation_post_process_45": {
|
16 |
+
"SINT": {
|
17 |
+
"scale": 0.003937007859349251,
|
18 |
+
"zero_point": -127
|
19 |
+
},
|
20 |
+
"SNORM": {
|
21 |
+
"scale": 0.49999999813735485,
|
22 |
+
"zero_point": -1.0
|
23 |
+
}
|
24 |
+
},
|
25 |
+
"activation_post_process_50": {
|
26 |
+
"SINT": {
|
27 |
+
"scale": 0.003937007859349251,
|
28 |
+
"zero_point": -127
|
29 |
+
},
|
30 |
+
"SNORM": {
|
31 |
+
"scale": 0.49999999813735485,
|
32 |
+
"zero_point": -1.0
|
33 |
+
}
|
34 |
+
},
|
35 |
+
"activation_post_process_55": {
|
36 |
+
"SINT": {
|
37 |
+
"scale": 0.003937007859349251,
|
38 |
+
"zero_point": -127
|
39 |
+
},
|
40 |
+
"SNORM": {
|
41 |
+
"scale": 0.49999999813735485,
|
42 |
+
"zero_point": -1.0
|
43 |
+
}
|
44 |
+
},
|
45 |
+
"activation_post_process_60": {
|
46 |
+
"SINT": {
|
47 |
+
"scale": 0.003937007859349251,
|
48 |
+
"zero_point": -127
|
49 |
+
},
|
50 |
+
"SNORM": {
|
51 |
+
"scale": 0.49999999813735485,
|
52 |
+
"zero_point": -1.0
|
53 |
+
}
|
54 |
+
},
|
55 |
+
"activation_post_process_65": {
|
56 |
+
"SINT": {
|
57 |
+
"scale": 0.003937007859349251,
|
58 |
+
"zero_point": -127
|
59 |
+
},
|
60 |
+
"SNORM": {
|
61 |
+
"scale": 0.49999999813735485,
|
62 |
+
"zero_point": -1.0
|
63 |
+
}
|
64 |
+
},
|
65 |
+
"activation_post_process_70": {
|
66 |
+
"SINT": {
|
67 |
+
"scale": 0.003937007859349251,
|
68 |
+
"zero_point": -127
|
69 |
+
},
|
70 |
+
"SNORM": {
|
71 |
+
"scale": 0.49999999813735485,
|
72 |
+
"zero_point": -1.0
|
73 |
+
}
|
74 |
+
}
|
75 |
+
},
|
76 |
+
"learnt_constants": {
|
77 |
+
"dm_scale": 0.617464542388916
|
78 |
+
}
|
79 |
+
}
|
scenario/scenario.json
ADDED
@@ -0,0 +1,821 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"commands": [
|
3 |
+
{
|
4 |
+
"mark_boundary": {
|
5 |
+
"frame_id": "0",
|
6 |
+
"resources": []
|
7 |
+
}
|
8 |
+
},
|
9 |
+
{
|
10 |
+
"dispatch_compute": {
|
11 |
+
"shader_ref": "0_pre_process",
|
12 |
+
"push_data_ref": "push_data_1",
|
13 |
+
"rangeND": [
|
14 |
+
60,
|
15 |
+
34,
|
16 |
+
1
|
17 |
+
],
|
18 |
+
"implicit_barrier": false,
|
19 |
+
"bindings": [
|
20 |
+
{
|
21 |
+
"set": 0,
|
22 |
+
"id": 2,
|
23 |
+
"resource_ref": "in_motion"
|
24 |
+
},
|
25 |
+
{
|
26 |
+
"set": 0,
|
27 |
+
"id": 0,
|
28 |
+
"resource_ref": "in_colour"
|
29 |
+
},
|
30 |
+
{
|
31 |
+
"set": 0,
|
32 |
+
"id": 7,
|
33 |
+
"resource_ref": "in_nearest_offset_tm1"
|
34 |
+
},
|
35 |
+
{
|
36 |
+
"set": 0,
|
37 |
+
"id": 5,
|
38 |
+
"resource_ref": "in_depth_tm1"
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"set": 0,
|
42 |
+
"id": 3,
|
43 |
+
"resource_ref": "in_history"
|
44 |
+
},
|
45 |
+
{
|
46 |
+
"set": 0,
|
47 |
+
"id": 4,
|
48 |
+
"resource_ref": "in_feedback_tm1"
|
49 |
+
},
|
50 |
+
{
|
51 |
+
"set": 0,
|
52 |
+
"id": 6,
|
53 |
+
"resource_ref": "in_derivative_tm1"
|
54 |
+
},
|
55 |
+
{
|
56 |
+
"set": 0,
|
57 |
+
"id": 1,
|
58 |
+
"resource_ref": "in_depth"
|
59 |
+
},
|
60 |
+
{
|
61 |
+
"set": 1,
|
62 |
+
"id": 1,
|
63 |
+
"resource_ref": "out_derivative",
|
64 |
+
"descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
|
65 |
+
},
|
66 |
+
{
|
67 |
+
"set": 1,
|
68 |
+
"id": 3,
|
69 |
+
"resource_ref": "out_nearest_offset",
|
70 |
+
"descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
|
71 |
+
},
|
72 |
+
{
|
73 |
+
"set": 1,
|
74 |
+
"id": 0,
|
75 |
+
"resource_ref": "out_input_tensor"
|
76 |
+
}
|
77 |
+
]
|
78 |
+
}
|
79 |
+
},
|
80 |
+
{
|
81 |
+
"dispatch_barrier": {
|
82 |
+
"image_barrier_refs": [],
|
83 |
+
"tensor_barrier_refs": [
|
84 |
+
"barrier_14"
|
85 |
+
],
|
86 |
+
"memory_barrier_refs": [],
|
87 |
+
"buffer_barrier_refs": []
|
88 |
+
}
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"dispatch_graph": {
|
92 |
+
"graph_ref": "1_nss",
|
93 |
+
"implicit_barrier": false,
|
94 |
+
"bindings": [
|
95 |
+
{
|
96 |
+
"set": 0,
|
97 |
+
"id": 0,
|
98 |
+
"resource_ref": "out_input_tensor"
|
99 |
+
},
|
100 |
+
{
|
101 |
+
"set": 0,
|
102 |
+
"id": 1,
|
103 |
+
"resource_ref": "out_feedback"
|
104 |
+
},
|
105 |
+
{
|
106 |
+
"set": 0,
|
107 |
+
"id": 2,
|
108 |
+
"resource_ref": "out_tp_aliaser"
|
109 |
+
},
|
110 |
+
{
|
111 |
+
"set": 0,
|
112 |
+
"id": 3,
|
113 |
+
"resource_ref": "out_k3_aliaser"
|
114 |
+
},
|
115 |
+
{
|
116 |
+
"set": 0,
|
117 |
+
"id": 4,
|
118 |
+
"resource_ref": "out_k2_aliaser"
|
119 |
+
},
|
120 |
+
{
|
121 |
+
"set": 0,
|
122 |
+
"id": 5,
|
123 |
+
"resource_ref": "out_k1_aliaser"
|
124 |
+
},
|
125 |
+
{
|
126 |
+
"set": 0,
|
127 |
+
"id": 6,
|
128 |
+
"resource_ref": "out_k0_aliaser"
|
129 |
+
}
|
130 |
+
]
|
131 |
+
}
|
132 |
+
},
|
133 |
+
{
|
134 |
+
"dispatch_barrier": {
|
135 |
+
"image_barrier_refs": [
|
136 |
+
"barrier_23",
|
137 |
+
"barrier_25",
|
138 |
+
"barrier_27",
|
139 |
+
"barrier_29",
|
140 |
+
"barrier_31",
|
141 |
+
"barrier_33"
|
142 |
+
],
|
143 |
+
"tensor_barrier_refs": [],
|
144 |
+
"memory_barrier_refs": [],
|
145 |
+
"buffer_barrier_refs": []
|
146 |
+
}
|
147 |
+
},
|
148 |
+
{
|
149 |
+
"dispatch_compute": {
|
150 |
+
"shader_ref": "2_post_process",
|
151 |
+
"push_data_ref": "push_data_22",
|
152 |
+
"rangeND": [
|
153 |
+
120,
|
154 |
+
68,
|
155 |
+
1
|
156 |
+
],
|
157 |
+
"implicit_barrier": false,
|
158 |
+
"bindings": [
|
159 |
+
{
|
160 |
+
"set": 0,
|
161 |
+
"id": 1,
|
162 |
+
"resource_ref": "in_motion"
|
163 |
+
},
|
164 |
+
{
|
165 |
+
"set": 0,
|
166 |
+
"id": 2,
|
167 |
+
"resource_ref": "in_history"
|
168 |
+
},
|
169 |
+
{
|
170 |
+
"set": 0,
|
171 |
+
"id": 8,
|
172 |
+
"resource_ref": "out_nearest_offset"
|
173 |
+
},
|
174 |
+
{
|
175 |
+
"set": 0,
|
176 |
+
"id": 3,
|
177 |
+
"resource_ref": "out_k0"
|
178 |
+
},
|
179 |
+
{
|
180 |
+
"set": 0,
|
181 |
+
"id": 4,
|
182 |
+
"resource_ref": "out_k1"
|
183 |
+
},
|
184 |
+
{
|
185 |
+
"set": 0,
|
186 |
+
"id": 5,
|
187 |
+
"resource_ref": "out_k2"
|
188 |
+
},
|
189 |
+
{
|
190 |
+
"set": 0,
|
191 |
+
"id": 6,
|
192 |
+
"resource_ref": "out_k3"
|
193 |
+
},
|
194 |
+
{
|
195 |
+
"set": 0,
|
196 |
+
"id": 0,
|
197 |
+
"resource_ref": "in_colour"
|
198 |
+
},
|
199 |
+
{
|
200 |
+
"set": 0,
|
201 |
+
"id": 7,
|
202 |
+
"resource_ref": "out_tp"
|
203 |
+
},
|
204 |
+
{
|
205 |
+
"set": 1,
|
206 |
+
"id": 0,
|
207 |
+
"resource_ref": "out_colour",
|
208 |
+
"descriptor_type": "VK_DESCRIPTOR_TYPE_STORAGE_IMAGE"
|
209 |
+
}
|
210 |
+
]
|
211 |
+
}
|
212 |
+
},
|
213 |
+
{
|
214 |
+
"mark_boundary": {
|
215 |
+
"frame_id": "1",
|
216 |
+
"resources": [
|
217 |
+
"out_colour"
|
218 |
+
]
|
219 |
+
}
|
220 |
+
}
|
221 |
+
],
|
222 |
+
"resources": [
|
223 |
+
{
|
224 |
+
"shader": {
|
225 |
+
"uid": "0_pre_process",
|
226 |
+
"src": "./0_pre_process.spv",
|
227 |
+
"entry": "main",
|
228 |
+
"type": "SPIR-V",
|
229 |
+
"push_constants_size": 128,
|
230 |
+
"specialization_constants": []
|
231 |
+
}
|
232 |
+
},
|
233 |
+
{
|
234 |
+
"raw_data": {
|
235 |
+
"uid": "push_data_1",
|
236 |
+
"src": "./0_pre_process_push_consts.npy"
|
237 |
+
}
|
238 |
+
},
|
239 |
+
{
|
240 |
+
"image": {
|
241 |
+
"uid": "in_motion",
|
242 |
+
"dims": [
|
243 |
+
1,
|
244 |
+
960,
|
245 |
+
544,
|
246 |
+
1
|
247 |
+
],
|
248 |
+
"src": "./in_motion.dds",
|
249 |
+
"format": "VK_FORMAT_R16G16_SFLOAT",
|
250 |
+
"shader_access": "readonly",
|
251 |
+
"mips": 1,
|
252 |
+
"min_filter": "LINEAR",
|
253 |
+
"mag_filter": "LINEAR",
|
254 |
+
"mip_filter": "NEAREST",
|
255 |
+
"border_address_mode": "CLAMP_BORDER",
|
256 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
257 |
+
"tiling": "OPTIMAL"
|
258 |
+
}
|
259 |
+
},
|
260 |
+
{
|
261 |
+
"image": {
|
262 |
+
"uid": "in_colour",
|
263 |
+
"dims": [
|
264 |
+
1,
|
265 |
+
960,
|
266 |
+
544,
|
267 |
+
1
|
268 |
+
],
|
269 |
+
"src": "./in_colour.dds",
|
270 |
+
"format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
|
271 |
+
"shader_access": "readonly",
|
272 |
+
"mips": 1,
|
273 |
+
"min_filter": "LINEAR",
|
274 |
+
"mag_filter": "LINEAR",
|
275 |
+
"mip_filter": "NEAREST",
|
276 |
+
"border_address_mode": "CLAMP_BORDER",
|
277 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
278 |
+
"tiling": "OPTIMAL"
|
279 |
+
}
|
280 |
+
},
|
281 |
+
{
|
282 |
+
"image": {
|
283 |
+
"uid": "in_nearest_offset_tm1",
|
284 |
+
"dims": [
|
285 |
+
1,
|
286 |
+
960,
|
287 |
+
544,
|
288 |
+
1
|
289 |
+
],
|
290 |
+
"src": "./in_nearest_offset_tm1.dds",
|
291 |
+
"format": "VK_FORMAT_R8_UNORM",
|
292 |
+
"shader_access": "readonly",
|
293 |
+
"mips": 1,
|
294 |
+
"min_filter": "LINEAR",
|
295 |
+
"mag_filter": "LINEAR",
|
296 |
+
"mip_filter": "NEAREST",
|
297 |
+
"border_address_mode": "CLAMP_BORDER",
|
298 |
+
"border_color": "FLOAT_CUSTOM_EXT",
|
299 |
+
"custom_border_color": [
|
300 |
+
0.0,
|
301 |
+
0.0,
|
302 |
+
0.0,
|
303 |
+
0.0
|
304 |
+
],
|
305 |
+
"tiling": "OPTIMAL"
|
306 |
+
}
|
307 |
+
},
|
308 |
+
{
|
309 |
+
"image": {
|
310 |
+
"uid": "in_depth_tm1",
|
311 |
+
"dims": [
|
312 |
+
1,
|
313 |
+
960,
|
314 |
+
544,
|
315 |
+
1
|
316 |
+
],
|
317 |
+
"src": "./in_depth_tm1.dds",
|
318 |
+
"format": "VK_FORMAT_R32_SFLOAT",
|
319 |
+
"shader_access": "readonly",
|
320 |
+
"mips": 1,
|
321 |
+
"min_filter": "LINEAR",
|
322 |
+
"mag_filter": "LINEAR",
|
323 |
+
"mip_filter": "NEAREST",
|
324 |
+
"border_address_mode": "CLAMP_BORDER",
|
325 |
+
"border_color": "FLOAT_CUSTOM_EXT",
|
326 |
+
"custom_border_color": [
|
327 |
+
0.0,
|
328 |
+
0.0,
|
329 |
+
0.0,
|
330 |
+
0.0
|
331 |
+
],
|
332 |
+
"tiling": "OPTIMAL"
|
333 |
+
}
|
334 |
+
},
|
335 |
+
{
|
336 |
+
"image": {
|
337 |
+
"uid": "in_history",
|
338 |
+
"dims": [
|
339 |
+
1,
|
340 |
+
1920,
|
341 |
+
1088,
|
342 |
+
1
|
343 |
+
],
|
344 |
+
"src": "./in_history.dds",
|
345 |
+
"format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
|
346 |
+
"shader_access": "readonly",
|
347 |
+
"mips": 1,
|
348 |
+
"min_filter": "LINEAR",
|
349 |
+
"mag_filter": "LINEAR",
|
350 |
+
"mip_filter": "NEAREST",
|
351 |
+
"border_address_mode": "CLAMP_EDGE",
|
352 |
+
"tiling": "OPTIMAL"
|
353 |
+
}
|
354 |
+
},
|
355 |
+
{
|
356 |
+
"image": {
|
357 |
+
"uid": "in_feedback_tm1",
|
358 |
+
"dims": [
|
359 |
+
1,
|
360 |
+
960,
|
361 |
+
544,
|
362 |
+
1
|
363 |
+
],
|
364 |
+
"src": "./in_feedback_tm1.dds",
|
365 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
366 |
+
"shader_access": "readonly",
|
367 |
+
"mips": 1,
|
368 |
+
"min_filter": "LINEAR",
|
369 |
+
"mag_filter": "LINEAR",
|
370 |
+
"mip_filter": "NEAREST",
|
371 |
+
"border_address_mode": "CLAMP_BORDER",
|
372 |
+
"border_color": "FLOAT_CUSTOM_EXT",
|
373 |
+
"custom_border_color": [
|
374 |
+
-1.0,
|
375 |
+
-1.0,
|
376 |
+
-1.0,
|
377 |
+
-1.0
|
378 |
+
],
|
379 |
+
"tiling": "OPTIMAL"
|
380 |
+
}
|
381 |
+
},
|
382 |
+
{
|
383 |
+
"image": {
|
384 |
+
"uid": "in_derivative_tm1",
|
385 |
+
"dims": [
|
386 |
+
1,
|
387 |
+
960,
|
388 |
+
544,
|
389 |
+
1
|
390 |
+
],
|
391 |
+
"src": "./in_derivative_tm1.dds",
|
392 |
+
"format": "VK_FORMAT_R8G8_UNORM",
|
393 |
+
"shader_access": "readonly",
|
394 |
+
"mips": 1,
|
395 |
+
"min_filter": "LINEAR",
|
396 |
+
"mag_filter": "LINEAR",
|
397 |
+
"mip_filter": "NEAREST",
|
398 |
+
"border_address_mode": "CLAMP_BORDER",
|
399 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
400 |
+
"tiling": "OPTIMAL"
|
401 |
+
}
|
402 |
+
},
|
403 |
+
{
|
404 |
+
"image": {
|
405 |
+
"uid": "in_depth",
|
406 |
+
"dims": [
|
407 |
+
1,
|
408 |
+
960,
|
409 |
+
544,
|
410 |
+
1
|
411 |
+
],
|
412 |
+
"src": "./in_depth.dds",
|
413 |
+
"format": "VK_FORMAT_R32_SFLOAT",
|
414 |
+
"shader_access": "readonly",
|
415 |
+
"mips": 1,
|
416 |
+
"min_filter": "LINEAR",
|
417 |
+
"mag_filter": "LINEAR",
|
418 |
+
"mip_filter": "NEAREST",
|
419 |
+
"border_address_mode": "CLAMP_BORDER",
|
420 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
421 |
+
"tiling": "OPTIMAL"
|
422 |
+
}
|
423 |
+
},
|
424 |
+
{
|
425 |
+
"image": {
|
426 |
+
"uid": "out_derivative",
|
427 |
+
"dims": [
|
428 |
+
1,
|
429 |
+
960,
|
430 |
+
544,
|
431 |
+
1
|
432 |
+
],
|
433 |
+
"dst": "./out_derivative.dds",
|
434 |
+
"format": "VK_FORMAT_R8G8_UNORM",
|
435 |
+
"shader_access": "writeonly",
|
436 |
+
"mips": 1,
|
437 |
+
"tiling": "LINEAR"
|
438 |
+
}
|
439 |
+
},
|
440 |
+
{
|
441 |
+
"image": {
|
442 |
+
"uid": "out_nearest_offset",
|
443 |
+
"dims": [
|
444 |
+
1,
|
445 |
+
960,
|
446 |
+
544,
|
447 |
+
1
|
448 |
+
],
|
449 |
+
"dst": "./out_nearest_offset.dds",
|
450 |
+
"format": "VK_FORMAT_R8_UNORM",
|
451 |
+
"shader_access": "readwrite",
|
452 |
+
"mips": 1,
|
453 |
+
"min_filter": "LINEAR",
|
454 |
+
"mag_filter": "LINEAR",
|
455 |
+
"mip_filter": "NEAREST",
|
456 |
+
"border_address_mode": "CLAMP_BORDER",
|
457 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
458 |
+
"tiling": "LINEAR"
|
459 |
+
}
|
460 |
+
},
|
461 |
+
{
|
462 |
+
"tensor": {
|
463 |
+
"uid": "out_input_tensor",
|
464 |
+
"dims": [
|
465 |
+
1,
|
466 |
+
544,
|
467 |
+
960,
|
468 |
+
12
|
469 |
+
],
|
470 |
+
"dst": "./out_input_tensor.npy",
|
471 |
+
"format": "VK_FORMAT_R8_SINT",
|
472 |
+
"shader_access": "readwrite",
|
473 |
+
"tiling": "LINEAR"
|
474 |
+
}
|
475 |
+
},
|
476 |
+
{
|
477 |
+
"graph": {
|
478 |
+
"uid": "1_nss",
|
479 |
+
"src": "./1_nss.vgf"
|
480 |
+
}
|
481 |
+
},
|
482 |
+
{
|
483 |
+
"tensor_barrier": {
|
484 |
+
"uid": "barrier_14",
|
485 |
+
"src_access": "compute_shader_write",
|
486 |
+
"dst_access": "graph_read",
|
487 |
+
"src_stage": [
|
488 |
+
"compute"
|
489 |
+
],
|
490 |
+
"dst_stage": [
|
491 |
+
"graph"
|
492 |
+
],
|
493 |
+
"tensor_resource": "out_input_tensor"
|
494 |
+
}
|
495 |
+
},
|
496 |
+
{
|
497 |
+
"tensor": {
|
498 |
+
"uid": "out_feedback",
|
499 |
+
"dims": [
|
500 |
+
1,
|
501 |
+
544,
|
502 |
+
960,
|
503 |
+
4
|
504 |
+
],
|
505 |
+
"dst": "./out_feedback.npy",
|
506 |
+
"format": "VK_FORMAT_R8_SINT",
|
507 |
+
"shader_access": "writeonly",
|
508 |
+
"tiling": "LINEAR"
|
509 |
+
}
|
510 |
+
},
|
511 |
+
{
|
512 |
+
"image": {
|
513 |
+
"uid": "out_tp",
|
514 |
+
"dims": [
|
515 |
+
1,
|
516 |
+
960,
|
517 |
+
544,
|
518 |
+
1
|
519 |
+
],
|
520 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
521 |
+
"shader_access": "readonly",
|
522 |
+
"mips": 1,
|
523 |
+
"min_filter": "LINEAR",
|
524 |
+
"mag_filter": "LINEAR",
|
525 |
+
"mip_filter": "NEAREST",
|
526 |
+
"border_address_mode": "CLAMP_BORDER",
|
527 |
+
"border_color": "FLOAT_TRANSPARENT_BLACK",
|
528 |
+
"tiling": "LINEAR"
|
529 |
+
}
|
530 |
+
},
|
531 |
+
{
|
532 |
+
"tensor": {
|
533 |
+
"uid": "out_tp_aliaser",
|
534 |
+
"dims": [
|
535 |
+
1,
|
536 |
+
544,
|
537 |
+
960,
|
538 |
+
4
|
539 |
+
],
|
540 |
+
"format": "VK_FORMAT_R8_SINT",
|
541 |
+
"shader_access": "readwrite",
|
542 |
+
"alias_target": {
|
543 |
+
"resource_ref": "out_tp"
|
544 |
+
},
|
545 |
+
"tiling": "LINEAR"
|
546 |
+
}
|
547 |
+
},
|
548 |
+
{
|
549 |
+
"image": {
|
550 |
+
"uid": "out_k3",
|
551 |
+
"dims": [
|
552 |
+
1,
|
553 |
+
960,
|
554 |
+
544,
|
555 |
+
1
|
556 |
+
],
|
557 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
558 |
+
"shader_access": "readonly",
|
559 |
+
"mips": 1,
|
560 |
+
"min_filter": "LINEAR",
|
561 |
+
"mag_filter": "LINEAR",
|
562 |
+
"mip_filter": "NEAREST",
|
563 |
+
"border_address_mode": "CLAMP_EDGE",
|
564 |
+
"tiling": "LINEAR"
|
565 |
+
}
|
566 |
+
},
|
567 |
+
{
|
568 |
+
"tensor": {
|
569 |
+
"uid": "out_k3_aliaser",
|
570 |
+
"dims": [
|
571 |
+
1,
|
572 |
+
544,
|
573 |
+
960,
|
574 |
+
4
|
575 |
+
],
|
576 |
+
"format": "VK_FORMAT_R8_SINT",
|
577 |
+
"shader_access": "readwrite",
|
578 |
+
"alias_target": {
|
579 |
+
"resource_ref": "out_k3"
|
580 |
+
},
|
581 |
+
"tiling": "LINEAR"
|
582 |
+
}
|
583 |
+
},
|
584 |
+
{
|
585 |
+
"image": {
|
586 |
+
"uid": "out_k2",
|
587 |
+
"dims": [
|
588 |
+
1,
|
589 |
+
960,
|
590 |
+
544,
|
591 |
+
1
|
592 |
+
],
|
593 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
594 |
+
"shader_access": "readonly",
|
595 |
+
"mips": 1,
|
596 |
+
"min_filter": "LINEAR",
|
597 |
+
"mag_filter": "LINEAR",
|
598 |
+
"mip_filter": "NEAREST",
|
599 |
+
"border_address_mode": "CLAMP_EDGE",
|
600 |
+
"tiling": "LINEAR"
|
601 |
+
}
|
602 |
+
},
|
603 |
+
{
|
604 |
+
"tensor": {
|
605 |
+
"uid": "out_k2_aliaser",
|
606 |
+
"dims": [
|
607 |
+
1,
|
608 |
+
544,
|
609 |
+
960,
|
610 |
+
4
|
611 |
+
],
|
612 |
+
"format": "VK_FORMAT_R8_SINT",
|
613 |
+
"shader_access": "readwrite",
|
614 |
+
"alias_target": {
|
615 |
+
"resource_ref": "out_k2"
|
616 |
+
},
|
617 |
+
"tiling": "LINEAR"
|
618 |
+
}
|
619 |
+
},
|
620 |
+
{
|
621 |
+
"image": {
|
622 |
+
"uid": "out_k1",
|
623 |
+
"dims": [
|
624 |
+
1,
|
625 |
+
960,
|
626 |
+
544,
|
627 |
+
1
|
628 |
+
],
|
629 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
630 |
+
"shader_access": "readonly",
|
631 |
+
"mips": 1,
|
632 |
+
"min_filter": "LINEAR",
|
633 |
+
"mag_filter": "LINEAR",
|
634 |
+
"mip_filter": "NEAREST",
|
635 |
+
"border_address_mode": "CLAMP_EDGE",
|
636 |
+
"tiling": "LINEAR"
|
637 |
+
}
|
638 |
+
},
|
639 |
+
{
|
640 |
+
"tensor": {
|
641 |
+
"uid": "out_k1_aliaser",
|
642 |
+
"dims": [
|
643 |
+
1,
|
644 |
+
544,
|
645 |
+
960,
|
646 |
+
4
|
647 |
+
],
|
648 |
+
"format": "VK_FORMAT_R8_SINT",
|
649 |
+
"shader_access": "readwrite",
|
650 |
+
"alias_target": {
|
651 |
+
"resource_ref": "out_k1"
|
652 |
+
},
|
653 |
+
"tiling": "LINEAR"
|
654 |
+
}
|
655 |
+
},
|
656 |
+
{
|
657 |
+
"image": {
|
658 |
+
"uid": "out_k0",
|
659 |
+
"dims": [
|
660 |
+
1,
|
661 |
+
960,
|
662 |
+
544,
|
663 |
+
1
|
664 |
+
],
|
665 |
+
"format": "VK_FORMAT_R8G8B8A8_SNORM",
|
666 |
+
"shader_access": "readonly",
|
667 |
+
"mips": 1,
|
668 |
+
"min_filter": "LINEAR",
|
669 |
+
"mag_filter": "LINEAR",
|
670 |
+
"mip_filter": "NEAREST",
|
671 |
+
"border_address_mode": "CLAMP_EDGE",
|
672 |
+
"tiling": "LINEAR"
|
673 |
+
}
|
674 |
+
},
|
675 |
+
{
|
676 |
+
"tensor": {
|
677 |
+
"uid": "out_k0_aliaser",
|
678 |
+
"dims": [
|
679 |
+
1,
|
680 |
+
544,
|
681 |
+
960,
|
682 |
+
4
|
683 |
+
],
|
684 |
+
"format": "VK_FORMAT_R8_SINT",
|
685 |
+
"shader_access": "readwrite",
|
686 |
+
"alias_target": {
|
687 |
+
"resource_ref": "out_k0"
|
688 |
+
},
|
689 |
+
"tiling": "LINEAR"
|
690 |
+
}
|
691 |
+
},
|
692 |
+
{
|
693 |
+
"shader": {
|
694 |
+
"uid": "2_post_process",
|
695 |
+
"src": "./2_post_process.spv",
|
696 |
+
"entry": "main",
|
697 |
+
"type": "SPIR-V",
|
698 |
+
"push_constants_size": 76,
|
699 |
+
"specialization_constants": []
|
700 |
+
}
|
701 |
+
},
|
702 |
+
{
|
703 |
+
"raw_data": {
|
704 |
+
"uid": "push_data_22",
|
705 |
+
"src": "./2_post_process_push_consts.npy"
|
706 |
+
}
|
707 |
+
},
|
708 |
+
{
|
709 |
+
"image_barrier": {
|
710 |
+
"uid": "barrier_23",
|
711 |
+
"src_access": "compute_shader_write",
|
712 |
+
"dst_access": "compute_shader_read",
|
713 |
+
"old_layout": "general",
|
714 |
+
"new_layout": "general",
|
715 |
+
"src_stage": [
|
716 |
+
"compute"
|
717 |
+
],
|
718 |
+
"dst_stage": [
|
719 |
+
"compute"
|
720 |
+
],
|
721 |
+
"image_resource": "out_nearest_offset"
|
722 |
+
}
|
723 |
+
},
|
724 |
+
{
|
725 |
+
"image_barrier": {
|
726 |
+
"uid": "barrier_25",
|
727 |
+
"src_access": "graph_write",
|
728 |
+
"dst_access": "compute_shader_read",
|
729 |
+
"old_layout": "general",
|
730 |
+
"new_layout": "general",
|
731 |
+
"src_stage": [
|
732 |
+
"graph"
|
733 |
+
],
|
734 |
+
"dst_stage": [
|
735 |
+
"compute"
|
736 |
+
],
|
737 |
+
"image_resource": "out_k0"
|
738 |
+
}
|
739 |
+
},
|
740 |
+
{
|
741 |
+
"image_barrier": {
|
742 |
+
"uid": "barrier_27",
|
743 |
+
"src_access": "graph_write",
|
744 |
+
"dst_access": "compute_shader_read",
|
745 |
+
"old_layout": "general",
|
746 |
+
"new_layout": "general",
|
747 |
+
"src_stage": [
|
748 |
+
"graph"
|
749 |
+
],
|
750 |
+
"dst_stage": [
|
751 |
+
"compute"
|
752 |
+
],
|
753 |
+
"image_resource": "out_k1"
|
754 |
+
}
|
755 |
+
},
|
756 |
+
{
|
757 |
+
"image_barrier": {
|
758 |
+
"uid": "barrier_29",
|
759 |
+
"src_access": "graph_write",
|
760 |
+
"dst_access": "compute_shader_read",
|
761 |
+
"old_layout": "general",
|
762 |
+
"new_layout": "general",
|
763 |
+
"src_stage": [
|
764 |
+
"graph"
|
765 |
+
],
|
766 |
+
"dst_stage": [
|
767 |
+
"compute"
|
768 |
+
],
|
769 |
+
"image_resource": "out_k2"
|
770 |
+
}
|
771 |
+
},
|
772 |
+
{
|
773 |
+
"image_barrier": {
|
774 |
+
"uid": "barrier_31",
|
775 |
+
"src_access": "graph_write",
|
776 |
+
"dst_access": "compute_shader_read",
|
777 |
+
"old_layout": "general",
|
778 |
+
"new_layout": "general",
|
779 |
+
"src_stage": [
|
780 |
+
"graph"
|
781 |
+
],
|
782 |
+
"dst_stage": [
|
783 |
+
"compute"
|
784 |
+
],
|
785 |
+
"image_resource": "out_k3"
|
786 |
+
}
|
787 |
+
},
|
788 |
+
{
|
789 |
+
"image_barrier": {
|
790 |
+
"uid": "barrier_33",
|
791 |
+
"src_access": "graph_write",
|
792 |
+
"dst_access": "compute_shader_read",
|
793 |
+
"old_layout": "general",
|
794 |
+
"new_layout": "general",
|
795 |
+
"src_stage": [
|
796 |
+
"graph"
|
797 |
+
],
|
798 |
+
"dst_stage": [
|
799 |
+
"compute"
|
800 |
+
],
|
801 |
+
"image_resource": "out_tp"
|
802 |
+
}
|
803 |
+
},
|
804 |
+
{
|
805 |
+
"image": {
|
806 |
+
"uid": "out_colour",
|
807 |
+
"dims": [
|
808 |
+
1,
|
809 |
+
1920,
|
810 |
+
1088,
|
811 |
+
1
|
812 |
+
],
|
813 |
+
"dst": "./out_colour.dds",
|
814 |
+
"format": "VK_FORMAT_B10G11R11_UFLOAT_PACK32",
|
815 |
+
"shader_access": "writeonly",
|
816 |
+
"mips": 1,
|
817 |
+
"tiling": "LINEAR"
|
818 |
+
}
|
819 |
+
}
|
820 |
+
]
|
821 |
+
}
|
scenario/typedefs.h
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
//
|
2 |
+
// -----------------------------------------------------------------------------
|
3 |
+
// The proprietary software and information contained in this file is
|
4 |
+
// confidential and may only be used by an authorized person under a valid
|
5 |
+
// licensing agreement from Arm Limited or its affiliates.
|
6 |
+
//
|
7 |
+
// Copyright (C) 2025. Arm Limited or its affiliates. All rights reserved.
|
8 |
+
//
|
9 |
+
// This entire notice must be reproduced on all copies of this file and
|
10 |
+
// copies of this file may only be made by an authorized person under a valid
|
11 |
+
// licensing agreement from Arm Limited or its affiliates.
|
12 |
+
// -----------------------------------------------------------------------------
|
13 |
+
//
|
14 |
+
#ifndef NSS_TYPEDEFS
|
15 |
+
#define NSS_TYPEDEFS
|
16 |
+
|
17 |
+
// fp16 types
|
18 |
+
#define half float16_t
|
19 |
+
#define half2 f16vec2
|
20 |
+
#define half3 f16vec3
|
21 |
+
#define half4 f16vec4
|
22 |
+
|
23 |
+
// fp32 types
|
24 |
+
#define float float32_t
|
25 |
+
#define float2 f32vec2
|
26 |
+
#define float3 f32vec3
|
27 |
+
#define float4 f32vec4
|
28 |
+
|
29 |
+
// int8 types
|
30 |
+
#define int8_t int8_t
|
31 |
+
#define int8_t2 i8vec2
|
32 |
+
#define int8_t3 i8vec3
|
33 |
+
#define int8_t4 i8vec4
|
34 |
+
|
35 |
+
// int16 types
|
36 |
+
#define int16_t int16_t
|
37 |
+
#define int16_t2 i16vec2
|
38 |
+
#define int16_t3 i16vec3
|
39 |
+
#define int16_t4 i16vec4
|
40 |
+
|
41 |
+
// uint16 types
|
42 |
+
#define uint16_t uint16_t
|
43 |
+
#define uint16_t2 u16vec2
|
44 |
+
#define uint16_t3 u16vec3
|
45 |
+
#define uint16_t4 u16vec4
|
46 |
+
|
47 |
+
// int32 types
|
48 |
+
#define int32_t int32_t
|
49 |
+
#define int32_t2 i32vec2
|
50 |
+
#define int32_t3 i32vec3
|
51 |
+
#define int32_t4 i32vec4
|
52 |
+
|
53 |
+
// uint32 types
|
54 |
+
#define uint32_t uint32_t
|
55 |
+
#define uint32_t2 u32vec2
|
56 |
+
#define uint32_t3 u32vec3
|
57 |
+
#define uint32_t4 u32vec4
|
58 |
+
|
59 |
+
// methods
|
60 |
+
#define lerp mix
|
61 |
+
|
62 |
+
// --- RCP functions for float16 types ---
|
63 |
+
half rcp(half x) { return half( 1.HF) / x; }
|
64 |
+
half2 rcp(half2 x) { return half2(1.HF) / x; }
|
65 |
+
half3 rcp(half3 x) { return half3(1.HF) / x; }
|
66 |
+
half4 rcp(half4 x) { return half4(1.HF) / x; }
|
67 |
+
|
68 |
+
// --- RCP functions for float32 types ---
|
69 |
+
float rcp(float x) { return float( 1.0f) / x; }
|
70 |
+
float2 rcp(float2 x) { return float2(1.0f) / x; }
|
71 |
+
float3 rcp(float3 x) { return float3(1.0f) / x; }
|
72 |
+
float4 rcp(float4 x) { return float4(1.0f) / x; }
|
73 |
+
|
74 |
+
// --- Saturate functions for float16 types ---
|
75 |
+
half saturate(half x) { return clamp(x, half( 0.HF), half( 1.HF)); }
|
76 |
+
half2 saturate(half2 x) { return clamp(x, half2(0.HF), half2(1.HF)); }
|
77 |
+
half3 saturate(half3 x) { return clamp(x, half3(0.HF), half3(1.HF)); }
|
78 |
+
half4 saturate(half4 x) { return clamp(x, half4(0.HF), half4(1.HF)); }
|
79 |
+
|
80 |
+
// --- Saturate functions for float32 types ---
|
81 |
+
float saturate(float x) { return clamp(x, 0.f, 1.f); }
|
82 |
+
float2 saturate(float2 x) { return clamp(x, float2(0.f), float2(1.f)); }
|
83 |
+
float3 saturate(float3 x) { return clamp(x, float3(0.f), float3(1.f)); }
|
84 |
+
float4 saturate(float4 x) { return clamp(x, float4(0.f), float4(1.f)); }
|
85 |
+
|
86 |
+
#endif // NSS_TYPEDEFS
|
third_party_licenses_and_copyright_notices.txt
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
ML SDK Scenario Runner - revision 197a36e
|
2 |
+
Source Code: https://github.com/arm/ai-ml-sdk-scenario-runner
|
3 |
+
License: Apache-2.0 (https://github.com/arm/ai-ml-sdk-scenario-runner/blob/main/LICENSES/Apache-2.0.txt)
|
4 |
+
Copyright Notice: "Copyright 2022-2025 Arm Limited and/or its affiliates <[email protected]>"
|
5 |
+
|
6 |
+
ML Emulation Layer for Vulkan® - revision 788ac99
|
7 |
+
Source Code: https://github.com/arm/ai-ml-emulation-layer-for-vulkan
|
8 |
+
License: Apache-2.0 (https://github.com/arm/ai-ml-emulation-layer-for-vulkan/blob/main/LICENSES/Apache-2.0.txt)
|
9 |
+
Copyright Notice: "Copyright 2022-2025 Arm Limited and/or its affiliates <[email protected]>"
|
10 |
+
|
11 |
+
Amazon Lumberyard Bistro
|
12 |
+
Asset page: http://developer.nvidia.com/orca/amazon-lumberyard-bistro
|
13 |
+
Download page: https://casual-effects.com/g3d/data10/research/model/bistro/Exterior.zip
|
14 |
+
License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
|
15 |
+
Copyright Notice: "Copyright 2017 Amazon Lumberyard"
|