Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android Hybrid - Search - App crashes when switching between expenses of the same report. #51584

Open
1 of 8 tasks
lanitochka17 opened this issue Oct 28, 2024 · 32 comments
Open
1 of 8 tasks
Assignees
Labels
Engineering Reviewing Has a PR in review Weekly KSv2

Comments

@lanitochka17
Copy link

lanitochka17 commented Oct 28, 2024

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Version Number: 9.0.54-2
Reproducible in staging?: Y
Reproducible in production?: N
If this was caught on HybridApp, is this reproducible on New Expensify Standalone?: Y
If this was caught during regression testing, add the test name, ID and link from TestRail: https://expensify.testrail.io/index.php?/tests/view/5135640&group_by=cases:section_id&group_order=asc&group_id=296775
Issue reported by: Applause - Internal Team

Action Performed:

  1. Open the Expensify app
  2. Open the search section
  3. Tap on "Outstanding"
  4. Open any report with several expenses on it. (At least 3 or 4)
  5. Open any expense on it
  6. Tap on the arrow on the top left corner
  7. Open another expense
  8. Repeat steps 5 to 7 a few more times
  9. Verify that you are able to switch between expenses without the app closing

Expected Result:

The user should be able to switch between expenses of the same report, all the necessary times, without the app closing

Actual Result:

The app crashes when switching between expenses of the same report in "Outstanding" section in search

Workaround:

Unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • Android: Standalone
  • Android: HybridApp
  • Android: mWeb Chrome
  • iOS: Standalone
  • iOS: HybridApp
  • iOS: mWeb Safari
  • MacOS: Chrome / Safari
  • MacOS: Desktop

Screenshots/Videos

Add any screenshot/video evidence
Bug6648292_1730123864600.Outstanding.mp4

logs (2).txt

View all open jobs on GitHub

@lanitochka17 lanitochka17 added DeployBlockerCash This issue or pull request should block deployment DeployBlocker Indicates it should block deploying the API labels Oct 28, 2024
Copy link

melvin-bot bot commented Oct 28, 2024

Triggered auto assignment to @jasperhuangg (DeployBlockerCash), see https://stackoverflowteams.com/c/expensify/questions/9980/ for more details.

Copy link

melvin-bot bot commented Oct 28, 2024

💬 A slack conversation has been started in #expensify-open-source

Copy link
Contributor

👋 Friendly reminder that deploy blockers are time-sensitive ⏱ issues! Check out the open `StagingDeployCash` deploy checklist to see the list of PRs included in this release, then work quickly to do one of the following:

  1. Identify the pull request that introduced this issue and revert it.
  2. Find someone who can quickly fix the issue.
  3. Fix the issue yourself.

@lanitochka17
Copy link
Author

Production:

Screenrecorder-2024-10-28-16-48-40-587.mp4

@lanitochka17
Copy link
Author

We think that this bug might be related to #wave-collect - Release 1

@jasperhuangg
Copy link
Contributor

jasperhuangg commented Oct 28, 2024

At the dentist, will tackle this when I'm back

@jasperhuangg
Copy link
Contributor

Building a revert that might be related

@hannojg
Copy link
Contributor

hannojg commented Oct 28, 2024

@lanitochka17 for debugging this issue, are you able to reproduce the bug on your device, connect it to a computer and run adb bugreport from a terminal and share the zip file here with us?

(The crash from the logs is an ANR, and in that log file we have no info to what happened. This bugreport zip file will help us to understand what caused the crash)

@jasperhuangg
Copy link
Contributor

@hannojg I'm testing right now, I'll try that out

@hannojg
Copy link
Contributor

hannojg commented Oct 28, 2024

Thanks! (Btw, the pr hasn't touched any UI related stuff and nothing in regards to reports, so i am not sure how likely it is related - but lets see!)

@jasperhuangg
Copy link
Contributor

dumpstate-2024-10-28-11-25-07.zip

I was able to reproduce on staging, here's the dumpstate, although I'm not entirely sure if the command succeeded on my device @hannojg

@jasperhuangg
Copy link
Contributor

I was also able to reproduce the bug in the NewDot App (not HybridApp), so we should be able to test some reverts.

@jasperhuangg
Copy link
Contributor

Yeah seems like the adb bugreport command is having permissions errors, looking into it.

@luacmartins luacmartins removed the DeployBlocker Indicates it should block deploying the API label Oct 28, 2024
@jasperhuangg
Copy link
Contributor

bugreport-beyond1ltexx-SP1A.210812.016-2024-10-28-11-43-45.zip

Here's the bug report after running it on New Expensify

@jasperhuangg
Copy link
Contributor

10-28 11:24:45.415  1058   500   500 I tombstoned: received crash request for pid 3837
10-28 11:24:45.415  1058   500   500 I tombstoned: found intercept fd 512 for pid 3837 and type kDebuggerdJavaBacktrace
10-28 11:24:45.416 10326  3837  3842 I putmethod.lati: Wrote stack traces to tombstoned
10-28 11:24:45.416  1058   500   500 W tombstoned: missing output fd
10-28 11:24:45.418  1000   971 32163 I system_server: libdebuggerd_client: done dumping process 3837
10-28 11:24:45.425  1000   971 32163 I ActivityManager: Collecting stacks for pid 3968
10-28 11:24:45.430  1000   971 32163 I system_server: libdebuggerd_client: started dumping process 3968
10-28 11:24:45.431  1058   500   500 I tombstoned: registered intercept for pid 3968 and type kDebuggerdJavaBacktrace
10-28 11:24:45.433  1000  3968  3973 I c.android.sdhm: Thread[6,tid=3973,WaitingInMainSignalCatcherLoop,Thread*=0x7b93c50f50,peer=0x13240d60,"Signal Catcher"]: reacting to signal 3
10-28 11:24:45.433  1000  3968  3973 I c.android.sdhm: 

Doesn't seem to be very useful 🤔

@jasperhuangg
Copy link
Contributor

I am fairly certain that the crash is being caused by #51332

I reproduced the bug on NewDot staging after navigating to/back from roughly 5-10 expense reports.

After testing the AdHoc build of the revert, I've navigated to/back expense reports > 50 times and haven't experienced a crash yet.

Sorry @hannojg 😓 I think I'm going to have to revert the PR again.

@jasperhuangg
Copy link
Contributor

Waiting on app deploys to be fixed before CPing

@jasperhuangg
Copy link
Contributor

Retested and seems to have fixed the issue, navigated to/from expenses in the report a bunch of times and no crash.

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

I reproduced the bug on NewDot staging after navigating to/back from roughly 5-10 expense reports.

@jasperhuangg were all those expense reports part of the same chat/report?

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

This is so weird. The code I have added is only used in SearchRouter. The code i added isn't even invoked in the flow that this bug describes.

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

Looking at the ANRs from your device we can see that the main/UI and JS thread are in a deadlock, presumably due to some sync calls caused by reanimated:

JS Thread Main Thread
CleanShot 2024-10-29 at 15 26 44 CleanShot 2024-10-29 at 15 23 57

Another thing i can see from your report is that your device was pretty low on memory when the ANR happens, only 200mb left. The code i added might introduced issues with high memory, but then, that code never gets executed in the bug's flow (it only gets executed when opening the search page):

CleanShot 2024-10-29 at 15 28 40

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

Aha, interesting, i can reproduce on release, but not on debug

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

Okay, i can reproduce this bug when building a release app from main (where the accused PR is reverted on), so i don't really think its caused by the PR. Let me test the adhoc build from the revert PR though

(happy to help investigate whats causing the bug though!)

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

I can reproduce this with the adhoc build as well:

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

I think the issue is caused by our native stack PR/bridgeless PR/or some general bug in reanimated that we've only discovered now for some reason

@jasperhuangg
Copy link
Contributor

@hannojg Hmm, shoot. I tried reproducing the bug for nearly 10 minutes on the AdHoc build and couldn't get it, but I was able to reproduce it within a minute on staging. I'm so sorry if I made a mistake reverting your PR.

Feel free to revert the revert again 😅 and I'd be happy to help you merge it. If this bug is really hard to reproduce, it makes sense that we could only find it now.

Do you have any recommendations on which PRs specifically would be causing this regression?

@jasperhuangg jasperhuangg reopened this Oct 29, 2024
@jasperhuangg
Copy link
Contributor

Ah hmm, that being said, @NikkiWines mentioned that the revert solved a deploy blocker that she was assigned to #51527, so we may still want to investigate that. It also doesn't really seem like your code is invoked there, so it may be unrelated and just flaky like this one.

@hannojg
Copy link
Contributor

hannojg commented Oct 29, 2024

Yes, i will make a new PR that addresses the other blockers soon!

For the crash: i can reproduce the bug very easily 100% of the time, so it should be easy for me to investigate. I can test a few likely PRs by reverting them to identify the faulty PR tmrw!

(I think you guys have a set of git commands that you use to revert a PR right, if so, could you share that one please?)

@hannojg
Copy link
Contributor

hannojg commented Oct 30, 2024

Note: i am still able to reproduce this bug on version 9.0.53-0, which is over one week old. Going further back …

// Edit: Also reproducible on 9.0.50-0, two weeks old

// Edit: also reproducible on 9.0.45-0, one month old

at this point i am not sure if it makes sense to go further back (i have the feeling i would find that its since we turned on the new arch). Maybe instead we should try to just fix the bug?

Copy link

melvin-bot bot commented Oct 30, 2024

⚠️ Looks like this issue was linked to a Deploy Blocker here

If you are the assigned CME please investigate whether the linked PR caused a regression and leave a comment with the results.

If a regression has occurred and you are the assigned CM follow the instructions here.

If this regression could have been avoided please consider also proposing a recommendation to the PR checklist so that we can avoid it in the future.

@Beamanator Beamanator removed the DeployBlockerCash This issue or pull request should block deployment label Oct 31, 2024
@Beamanator
Copy link
Contributor

I'm marking this NAB since @hannojg confirmed this can be reproduced on older builds, cool? 👍

Thanks so much for your investigation, both of y'all! 🙏

@hannojg
Copy link
Contributor

hannojg commented Nov 4, 2024

Let us know if you want us to investigate whats going on (feel free to assign to me, either i will work on it or someone else from our margelo team can take a look!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Reviewing Has a PR in review Weekly KSv2
Projects
None yet
Development

No branches or pull requests

5 participants