Skip to content

Commit

Permalink
Android/ARM64: Optimize AreQuaternionsOpposite
Browse files Browse the repository at this point in the history
Applies architecture-specific optimizations for ARM64 using NEON intrinsics, which enhances the performance of quaternion comparison by leveraging vectorized floating-point operations.

Fixes Batman Arkham Origins: Blackgate deadlocks and maybe more games
  • Loading branch information
weihuoya authored and Gamer64ytb committed Jul 25, 2024
1 parent 0ceaa1b commit 45923c3
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion src/video_core/rasterizer_accelerated.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.

#ifdef ARCHITECTURE_ARM64
#include <arm_neon.h>
#endif
#include "common/alignment.h"
#include "core/memory.h"
#include "video_core/pica/pica_core.h"
Expand Down Expand Up @@ -75,10 +78,19 @@ RasterizerAccelerated::RasterizerAccelerated(Memory::MemorySystem& memory_, Pica
* these issues, making this basic implementation actually more accurate to the hardware.
*/
static bool AreQuaternionsOpposite(Common::Vec4<f24> qa, Common::Vec4<f24> qb) {
#ifdef ARCHITECTURE_ARM64
const float32_t a[4] = {qa.x.ToFloat32(), qa.y.ToFloat32(), qa.z.ToFloat32(), qa.w.ToFloat32()};
const float32_t b[4] = {qb.x.ToFloat32(), qb.y.ToFloat32(), qb.z.ToFloat32(), qb.w.ToFloat32()};
const float32x4_t aa = vld1q_f32(a);
const float32x4_t bb = vld1q_f32(b);
const float32x4_t mm = vmulq_f32(aa, bb);
const float32x2_t s2 = vadd_f32(vget_high_f32(mm), vget_low_f32(mm));
return (vget_lane_f32(vpadd_f32(s2, s2), 0) < 0.f);
#else
Common::Vec4f a{qa.x.ToFloat32(), qa.y.ToFloat32(), qa.z.ToFloat32(), qa.w.ToFloat32()};
Common::Vec4f b{qb.x.ToFloat32(), qb.y.ToFloat32(), qb.z.ToFloat32(), qb.w.ToFloat32()};

return (Common::Dot(a, b) < 0.f);
#endif
}

void RasterizerAccelerated::AddTriangle(const Pica::OutputVertex& v0, const Pica::OutputVertex& v1,
Expand Down

0 comments on commit 45923c3

Please sign in to comment.