-
Notifications
You must be signed in to change notification settings - Fork 162
Release notes
JianRong JIN edited this page May 25, 2018
·
53 revisions
- Add fp16 interpolation intrinsics and register settings for AMD_gpu_shader_half_float
- Add extension VK_AMD_gpu_shader_half_float_fetch (not enabled)
- [LLPC]Support dual source blend
- [LLPC]Enable on-chip GS by default for GFX6-8
- Check VkPhysicalDeviceFeatures2 on device create
- Barrier optimization: move decision about whether to apply layout transitions for this barrier in case of ownership transfers to the ImageBarrierPolicy class
- [LLPC] spir-v reader: fix clang compile error in image code
- Remove the support for PRT depth/stencil formats. Single-aspect depth and stencil are still supported
- Report per-aspect sparse image format properties for depth/stencil
- [LLPC] Fix an issue of MRT color out
- Simplify sparse texture bind virtual offset calculation
- Remove the implicit null sparse bind on queue 0
- Disable loop unroll for game TombRaider to work-around an issue that lighting is incorrect on main menu and in benchmark
- [LLPC]Support new dimension aware image instrinsics
- Support general Fmask loading
- Fix SubpassDataArray dimension in GL_EXT_multiview
- Fix assert caused by missing image layout in renderpass logger
- Add IHashProvider and IHashContext to PAL Util namespace
- New a flag sampleLocsAlwaysKnown to enable defer MSAA depth expand optimization for GFX6~9
- Null initialize the fmask srd if in CreateFmaskViewSrdsInternal() there is no fmask for the image
- Fix the issue that VK_KHR_maintenance1 + sDMA queue: 2D Array image -> 3d image copy ops (and vice versa) does not work
- Fix copies of BCn mip-levels where the HW determines the incorrect size of the mip level
- Revert api_version in Json file to 1.1.70
- Enable extension VK_KHR_display
- [LLPC]Add missing int64 function
- [LLPC]Support new dimension aware image instrinsics
- Add runtime option to support switching between dimension aware image intrinsics and old image intrinsic
- Add dimension aware version of fetch fmaskvalue,
- Fix fmask loading failure in VulkanCTS dEQP-VK.amd.shader_fragment_mask group
- Expose the subgroup arithmetic capabilities
- Pipeline stats crash the GPU profiler
- Only disable DE workload IB when PAL MCBP is off
- Buffer->image copy op truncates output written to the image if a 2D R32G32B32 linear 48x240 image is used
- Add new interface CmdDrawOpaque() in PAL
- Update LLVM to trunk @329887
- Fix shader_ballot writelane incorrect issue
- Update Vulkan headers to 1.1.73
- Implement VK_EXT_descriptor_indexing (not enabled )
- [LLPC]Begin to add support (ImageRead and ImageFetch) for dimension aware image intrinsics which is newly added in LLVM backend and will replace old hardware oriented image intrinsics
- [LLPC]Use wqm intrinsic for ds_swizzle derivatives
- [LLPC]Update SPIR-V header
- Fix bugs in fetch RGB10A2
- Barrier optimization: move the responsibility of handling image layouts to the barrier policy classes
- Set PARTIAL_VS_WAVE_ON to 1 for off-chip GS to work-around an issue of system hang
- Remove support for image atomics from formats that should not support it
- Remove the Per-Device ring buffers for CE RAM dumps
- Make internal CE RAM dumps cacheline-aligned
- Fix GPU scratch memory allocation bug
- Enable AMD_shader_ballot and AMD_gpu_shader_half_float extension
- Expose the subgroup shuffle capabilities, implement arithmetic 16bit and 64bit operation
- Enable app_shader_optimizer in LLPC path
- Barrier optimization
- Workaroud TombRaider third benchmark hang issue
- Fix allocation granularity issue
- Add max mask enum for ImageLayoutUsageFlags and CacheCoherencyUsageFlags
- Fix issues in FragColorExport::ComputeExportFormat()
- Remove 32-bit CTS workaround
- Set unboundDescriptorDebugSrdCount PAL setting to 0 to avoid CTS issues with using multiple devices through testing
- Fix the issue that driver reports currentExtent of (N, 0) on zero sized width/height surface; According to Vulkan spec 1.1.70.1, currentExtent of a valid window surface(Win32/Xlib/Xcb) must have both * width and height greater than 0, or both of them 0
- Fix LLPC assert on image type
- Use runtime cache mode to get contextCache and reduce the time of running CTS tests
- Command buffer dumping fixes, provide the correct engine ID for SDMA command buffers
- Set DropIfSameContext for the CE preamble stream.
- Add max mask enum for ImageLayoutUsageFlags and CacheCoherencyUsageFlags
- Fix assert and build error for PAL null device
- Fix app crash when reading amdPalSettings.cfg
- Fix source image descriptors for graphics depth/stencil copies.
- Fix dEQP-VK.api.external.semaphore.opaque_fd.import_twice_temporary CTS test hang on Vega
- Partially revert earlier change for clean-up of user data table management code. Most of the original change is not reverted, just the portion which moves some common structures from each HWL to the independent layer for universal command buffers. The compute command buffer changes were left as-is.
- Moves the DescribeDraw calls after validateDraw in all CmdDraw calls
- Implement PAL support needed for KHR_Display extension
- Update LLVM to trunck @329887, fixing the MadMax corruption issue introduced in last update
- Enable extension VK_AMD_shader_image_load_store_lod
- Enable extension VK_AMD_gcn_shader
- Implement subgroup arithmetic operations
- [LLPC] Add missing pipeline member in pipeline dump
- Optimize subgroup function name generating process, generate functions based on the subgroup arithmetic group op
- Enable SyncobjFence and choose which fence type to use during runtime
- Implement SYNC_FD handle type for External Fence and Semaphore
- Remove releasing stack allocator in CmdBuffer::End()
- Set UseRingBufferForCeRamDumps default back to true
- No need to allocate memory for Sampler descriptors for all Gpus in the device group.
- Fix verification error using R32ui image format
- Fix pipeline compilation failure when running ManiaPlanet on Wine
- Fix and optimize the use of some of the barrier flags which were noted to be handled incorrectly or inconsistently
- Expand reporting of CmdBindTargets in the logger
- Enable support for IL_OP_LOAD_DWORD_AT_ADDR in ILP
- Add logic to memtracker to detect when someone corrupts the allocation list by scribbling into the heap
- Remove CE/DE counter syncs from the postamble command streams on gfxip8+
- Fix the issue that vkAcquireNextImageKHR returning VK_TIMEOUT w/o waiting the timeout duration
- Update LLVM to be based on trunck 328191. The new LLVM code introduced a rendering corruption issue with game MadMax, will be fixed in next update
- Reduce unnecessary malloc/free calls
- [LLPC] Change the undef value to 0 or 0.0 for those unsupported functions. This is because undef value will block constant folding in LLVM and the nested constant expression after lower will be time-consuming when backend does analysis
- [LLPC] Support int64 atomic operations
- Tweaks the way tha handles load op clears in renderpasses to fix too many barriers in render pass clear
- Add error handling where AddMemReference() is used; Add vk::Memory::CreateGpuMemory() and vk::Memory::CreateGpuPinnedMemory()
- Fix assertion when running DOOM 2016 in Wine
- Set "vm" flag for all fragment outputs
- Add FMASK shadow table support to the Vulkan Driver which changes descriptors are stored in memory. This allows writing the FMASK descriptors in the same corresponding upper 32 bits of the STA descriptors VA address
- Fix missing cmd scratch memory heap in gpasession. Prevents a divide by zero exception when initializing driver for RGP traces
- Explicitly acquire and release ownership of the queue context in PAL's preamble and postamble command streams
- PAL no longer try to chain from the last command buffer to the postamble command streams.*
- Fix interfaceLogger access violation. DataAllocNames array does not match CmdAllocType enum.
- VK_AMD_gpu_shader_int16 + VK_AMD_shader_trinary_minmax + GFX9: Graphics pipeline fails to create if functionality dependent on the two exts is used
- Rewrite VamMgrSingleton to avoid static members
- Clean-Up of User Data Table Management Code
- Add int16 support to AMD_shader_ballot and AMD_trinary_minmax extension
- [LLPC] Enable RetBlock in GS to make sure only one return is used in GS
- Refine Pipeline dump
- Simplify pipeline panel options
- Update variable name in llpcAbiMetadata.h to match palPipelineAbi original name
- Remove metadata name in RegNameMap, instead, Util::Abi::PipelineMetadataNameStrings is used
- Fix a bug in PipelineCompiler::ApplyBilConvertOptions, the return value of GetRuntimeSettings must be a reference
- AMD_shader_ballot:
- Rename glslSpecialOpEmuF16 to glslSpecialOpEmuD16
- Add stubs of subgroup arithmetic operations for i64 and f16
- use tbuffer_load_d16 to do vertex fetching
- Implement a consistent dispatch table mechanism across the driver
- Now we have separate global, per-instance, and per-device dispatch tables
- We can override individual entry points in each dispatch table to enable optimizations based on app profile or any other criteria
- Entry points now can have complex requirement criteria and we now clearly distinguish between instance and device level functions
- SQTT layer handling is still a bit clumsy because it operates more like a device-only layer, but at least it's injection code is less intrusive now
- Also fixed a bunch of unrelated bugs and missing implementation on the way, as the new code revealed those
- Update Pipeline Dump service to inherit from IService instead of URIService (which is deprecated and being removed).
- Changes ValidateDraw to reserve its own space rather than including it with the rest of the draw related packets. This avoids running out of reserved space in TimeSpy
- Move MetroHash and jemalloc to pal/src/util/imported from pal/src/core/imported
- Enable below extensions:
- AMD_shader_explicit_vertex_parameter
- AMD_shader_trinary_minmax
- AMD_mixed_attachment_samples
- AMD_shader_fragment_mask
- EXT_queue_family_foreign
- Enable AMD_gpu_shader_int16 for gfx9
- Enable shaderInt64
- Disable extension AMD_gpu_shader_half_float since the interpolation in FS is not implemented.
- Add arithmetic operations of AMD_shader_ballot
- Implement subgroup arithmetic reduce int ops
- Remove KHR suffixes for promoted extensions: replace some of the KHXs with KHRs, the rest should go away whenever device group KHXs are removed
- Remove Vulkan 1.0 headers because 1.1's are backward compatible, 1.0 driver functionality can still be built with USE_NEXT_SDK=0
- Fix an issue that incorrect buffer causes compute shader loop infinitely
- Disable FmaskBasedMsaaRead for Dota2, which can bring ~1% performance gain for Dota2 4K + best-looking on Fiji:
- Add FMASK shadow table support to LLVM/LLPC
- Fix the issue that Wolfenstein 2 fails to compile compute shader
- Fix dEQP-VK.api.image_clearing.core.clear_color_image.3d.* CTS tests failure
- Clarifies an existing 3D color target interface requirement and fixes a bug which can cause DCC corruption.
- Fix an issue related to fast clear eliminate
- Do late expand for HTILE if it used fixfuction resolve
- Fix a LLVM issue (zext of f16 to i32) for Dawn of War III corruption on Radeon™ RX Vega
- Eliminate stalls between command buffers, Phase #1
- Enable Wayland extension
- Enable AMD_texture_gather_bias_lod extension
- Implementation for below AMD extensions:
- AMD_shader_fragment_mask
- AMD_gcn_shader
- AMD_shader_trinary_minmax
- AMD_shader_explicit_vertex_parameter
- AMD_shader_ballot
- Enable subgroupQuadSwapHorizontal, subgroupQuadSwapVertical, subgroupQuadSwapDiagonal, subgroupQuadBroadcast(uint/int, uint)
- Fix issues when grouping all identical devices into single device group and enable support to group the devices if they have matching Pal::DeviceProperties::deviceIds, pass CTS device group testing
- Hide VK_AMD_negative_viewport_height in Vulkan 1.1: using the extension is no longer legal, because 1.1 core includes VK_KHR_maintenance1
- [LLPC] make spir-v bool-in-mem i8 rather than i1
- Enable shader prefetcher for Serious Sam Fusion and Dota2, about 2.5% performance gain
- Remove redundant divide in BindVertexBuffers() (PAL does the same divide). Remove extra bookeeping needed for the redundant divide
- Fix some issues in the RGP command buffer tag based capture code
- Move Pipeline & User-Data Binding to Draw-Time, observed some nice gains in several applications, and other apps were neutral in terms of performance loss/gain
- Fix an order of initialization issue related to public settings
- VK_KHR_image_format_list for swapchains: add the necessary PAL support for deciding image compression policy for presentable images based on a list of possible view formats
- Report to clients that GFX OFF may reset the GFX timestamp to 0 after an idle period
- Fix some issues in command buffer dumping
- Implement COND_EXEC style predication for CP DMA path in CmdCopyMemory on compute command buffers
- Change CreateTypedBufferViewSrds() and CreateUntypedBufferViewSrds() to remove the requirement that the range is a multiple of the stride
- Make Pal Linux VA manager support multi-device cases
- Fix dEQP-VK.api.object_management.multithreaded_per_thread_resources.instance random crash
- Fix validation bug with computing PBB bin sizes
- Don't allow LayoutCopySrc on images of a format that doesn't support buffers
- Add call to DevDriver ShowOverlay() function to determine if the developer driver overlay should be displayed.
- Handle unaligned memory to image and image to memory copies on the DMA Queue
- Convert some PAL inline utility functions into constexpr functions and fix some const-correctness issues.
- Resolve potential HW bug with SDMA copy overlap syncs on GFX9
- Temporarily disable the SDMA copy overlap sync feature on GFX9 for a suspected HW ucode bug with SDMA's ability to detect certain hazards which results in race conditions in SDMA stress tests.
- Fix bug in PA_SC_MODE_CNTL_1 validation
- Improve hotspots related to Color-Target & Depth/Stencil views, some improvements in CPU performance when creating color-target and depth-stencil view objects in PAL
- Make GFX9's BuildSetSeqContextRegs() and BuildSetSeqConfigRegs() avoid reading from the command buffer similar to what is done for GFX6. Cleans up big spikes if Vulkan uses write combined command buffers (they were small bumps when using cacheable command buffers)
- Add Instance- and Device-specific dispatch tables. Comply with spec requirements
- Handle unaligned memory to image and image to memory copies on the DMA Queue
- Use included headers to determine apiVersion instead of manual bumps
- Complete VK_EXT_sampler_filter_minmax extension, allows more formats and is completely driven by the formats spreadsheet
- Enable VK_EXT_shader_subgroup_vote and VK_EXT_shader_subgroup_ballot support
- VK_KHR_subgroup support: - Add missing subgroup builtins in compute shader - Move the implementation of gl_SubGroupSize from patch phase to .ll library - Support for the shufflexor, shuffleup, shuffledown function
- VK_KHR_multiview support: - LoadOp Clears implementation - Rewrite the function ConfigBuilder::BuildUserDataConfig to support merged shader. - Adjust the position of SGPR to emulate ViewIndex. - Set the user data configuration of ViewId even if the stage is not the last vertex processing stage.
- Implement interaction between VK_KHR_multiview and VK_KHR_device_group by adding support for VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT.
- Change implementation of KHR_descriptor_update_template to move work from vkUpdateDescriptorSetWithTemplateKHR to vkCreateDescriptorUpdateTemplateKHR
- Batch large numbers of copy/clear/etc. image regions to avoid OOM errors
- Rearranged the loop in DescriptorSet::InitImmutableDescriptors() to avoid looking up the the descriptor sizes in the device unless necessary. Cuts time in DescriptorSet::Reassign() in half.
- Remove DescriptorSetHeap::m_pHandles. We can compute the handle with a little arithmetic instead of a memory lookup. Cuts the time in AllocDescriptorSets() in half.
- [LLPC]Implement sparse texture residency
- [LLPC]Fix Crash when parsing Hull Shader
- [LLPC]Fix problems with address space mapping
- [LLPC]Restored correct addr space for gs-vs ring buffer descriptor load
- Fix an assert when running DOOM in Wine
- Don't treat MSAA image as pure shader resolve/read src if CB fixed function resolve method is preferred
- Implement the changes needed to change the fast clear code from the 3 special values ((0,0,0,1), (1,1,1,1) and (1,1,1,0)) to ClearColorReg when we mix signed and unsigned formats views for a resource
- Don't write IA_MULTI_VGT_PARAM and VGT_LS_HS_CONFIG in ValidateDrawTimeHwState
- Remove unnecessary calls to SetContextRollDetected() during GFX9 command buffer generation
- Remove the software-based dynamic primgroup optimization on GFX9
- Fix GpuProfiler ThreadTrace shader hashes. 64-bit to 128-bit
- Optimize path with depth clamp disabled. Set DISABLE_VIEWPORT_CLAMP only if depth clamp is disabled in pipeline and depth is exported in fragment shader
- Trace SQTT Causes Driver AV if sqtt.gpuMemoryLimit is Too Small
- Enable Vulkan 1.1 support
- Enable VK_AMD_shader_core_properties extension
- Enable VmAlwaysValid feature for kernel 4.16 and above
- Force per-sample shading if the shader is using per-sample features
- [LLPC] added addr space translation pass
- Handle OOM errors during command buffer recording
- Fix the problem that driver unbinds vertex buffers when binding a new pipeline
- Fix gpuProfiler crash when starting capture from first frame)
- [gfx6] Update DB with correct address for PERFCOUNTERx_SELECT1 register, fixing GPU hang on issuing spm traces with more than 2 events for DB
- Fix a CmdClearDepthStencil bug and adds validation to avoid 3D depth/stencil images
- Expose perSampleShading PS parameter in PipelineInfo
- Complete Geometry shader and tessellation support for gfx9
- Clear v1.0 CTS failures for Radeon™ RX Vega Series
- Generate extension related source files during driver building time
- Enable VK_EXT_depth_range_unrestricted extension
- Fix vrcompositor startup crash issue
- Fix random failure in AMD_buffer_marker tests
- Reduce time to clear AllGpuRenderState structure by removing Pal::DynamicGraphicsShaderInfos graphicsShaderInfo and Pal::DynamicComputeShaderInfo computeShaderInfo and making them local variables
- [LLPC] use PassManagerBuilder instead of a forked and modified copy of opt
- Vulkan queue marker to trigger RGP capture (Frame terminator)
- Re-order the PreciseAnisoMode enum for clarity; Change the PreciseAnisoMode value based on the public Radeon Settings Texture filter quality (TFQ) setting
- Fix vulkan CTS failures of dEQP-VK.api.external.memory.opaque_fd.dedicated with VM-always-valid enabled.
- Fix a multi-thread segfault issue
- Fix some Coverity warnings
- Improve CPU performance by removing read modify writes in CreateUntypedBufferViewSrds
- Enhance GFX9 support
- Texture filtering quality changes
- Sample mask input to fs shouldn't force per-sample execution
- Fix LLVM error when using both OpImageSampleDref* and OpImageSample* on the same image
- CPU optimization for Dota2: reduces the time spent in CmdBuffer::RPSyncPoint() and its callees from 3.1% to 0.4%.
- [LLPC] Enable fastMathMode for floating point
- [LLPC] Enable NoSignedZero for FP math to activate omod modifiers
- Program CHKSUM register with the value obtained from the pipeline binary for SPP.
- Fix implicit prim shader controls.
- Fix "all" null device creation to skip undefined devices
- Add "virtual" to some destructors in PAL
- Add new field in struct DynamicComputeShaderInfo to support LDS size update during binding compute pipeline.
- Implement VK_EXT_external_memory_host extension, enable the extension by default
- Enhance on-chip GS support
- Avoiding redundant lookups of Pal::ICmdBuffer* in GraphicsPipeline::BindToCmdBuffer()
- Save PAL pipeline hash when pipeline is created
- Device group: trim the number of structs returned from EnumeratePhysicalDeviceGroups
- Cleanup and remove redundant code for ImageToBuffer and ImageToImage copies as the two step copies are detected and handled in PAL for SDMA queues
- Remove deprecated metadata PsRunsAtSampleRate
- Fix the crash caused by Mismatch of OpName on entry point and OpEntryPoint name
- LLPC: Remove print module in each pass, move verify module to debug build
- LLPC: Stop AShr and LShr from having exact flag set in SPIRV translation
- Add streaming Perf counter support in PAL for gfxip-9
- Implement optimal sharing
- Expose texture 3D PRTs for queryable tile shapes for GFX 7/8
- Update GetMaxGpuMemoryAlignment to account for metadata alignment
- Reduce the number of small surfaces that need CMASK or DCC
- Improve efficiency of MsaaState::SetCentroidPriorities()
- Unbinds vertex buffers when binding a new pipeline
- Fix issue when running with mode setting driver
- Fix issue when running on XWayland
- Upgrade LLVM to new code base
- Implement VK_AMD_buffer_marker extension
- Implement VK_EXT_debug_report extension
- Pass layout to InitImmutableDescriptors(), removes 80% of the time in DescriptorSet::Reassign()
- Calculate location of bindings for descriptor set layout to avoid a memory lookup
- Disable depth clamping when enableDepthClamp is set to false
- Fix CTS dEQP-VK.tessellation.shader_input_output.barrier failure, simplify the TessFactorToBuffer offset calculation
- Fix CTS dEQP-VK.glsl.440.linkage.varying.component group testing failure
- Fix dEQP-VK.api.external.semaphore.opaque_fd.signal_wait_import_permanent
- Fix dEQP-VK.spirv_assembly.instruction.compute.image_sampler.imagefetch.* hang issue.
- Pass image format list to PAL to allow enabling DCC for certain cases.
- llpc: Use llvm build's llvm-as and llvm-link to save driver build time
- llpc: update merged shader implementation
- Get rid of unnecessary synchronization for present jobs.
- Fix a couple of asserts when loading llvm-generated ELF
- [PAL Util] Enhancements for containers.
- Fix PalAlert when create null ps
- Enable VK_EXT_global_priority extension when libdrm version >= 3.22
- Enable VK_KHR_MAINTENANCE2 extension
- Madmax performance tuning, ~3% gain
- LLPC refine
- Fix a bug when multiple devices access the same shader cache disk file, create internal shader cache instance per gfxip
- Fix compile error with GCC7.2
- Fix Dynamic WaveLimits for Graphics Pipelines
- Cache robustBufferAccess in the descriptor set
- Hook up present timing events in RGP queue traces
- Add pass llpcSpirvLowerZero to optimize float zero operations
- Add implementation of AMD_gpu_shader_half_float and AMD_gpu_shader_int16 (not completed)
- Add implementation of VK_EXT_global_priority (not completed)