Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues #118

Open
3 of 5 tasks
diego3g opened this issue Aug 13, 2021 · 9 comments
Open
3 of 5 tasks

Performance issues #118

diego3g opened this issue Aug 13, 2021 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@diego3g
Copy link
Owner

diego3g commented Aug 13, 2021

I sent an email for 18k contacts and noticed some CPU and memory problems.

Summary

  • The HTTP service got a high CPU usage due to AWS SNS notifications;
  • The queue service got a high CPU usage due to queue processing to send emails;
  • The HTTP service got a high memory usage when adding the batch of jobs to the queue;

Errors

When receiving some AWS events, the HTTP service was logging some errors with this trace:

PrismaClientKnownRequestError2 [PrismaClientKnownRequestError]: Unique constraint failed on the fields: (`message_id`,`contact_id`)
    at cb (/home/node/app/node_modules/@prisma/client/runtime/index.js:34800:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async PrismaRecipientsRepository.saveWithEvents (/home/node/app/dist/modules/broadcasting/repositories/prisma/PrismaRecipientsRepository.js:41:5)
    at async RegisterEvent.execute (/home/node/app/dist/modules/broadcasting/useCases/RegisterEvent/RegisterEvent.js:60:5)
    at async RegisterEventController.handle (/home/node/app/dist/modules/broadcasting/useCases/RegisterEvent/RegisterEventController.js:90:22)
    at async /home/node/app/dist/infra/http/adapters/ExpressRouteAdapter.js:15:26 {
  code: 'P2002',
  clientVersion: '2.28.0',
  meta: { target: [ 'message_id', 'contact_id' ] }
} 

Remediations

Infrastructure

  • Split the HTTP service that receives SNS notifications in a new service with (100mCPU, 144mb RAM and 35% CPU autoscale threshold);
  • Increased the amount of CPU of Queue service to 400m CPU;

Application

  • Do some benchmarks with Bee Queue module and different concurrency factors;
  • Add jobs to queue in batches instead of creating a huge jobs array;
  • Try to reduce the amount of queries during notification receiving;
@diego3g diego3g added the enhancement New feature or request label Aug 13, 2021
@diego3g diego3g self-assigned this Aug 13, 2021
@diego3g
Copy link
Owner Author

diego3g commented Aug 13, 2021

Did some benchmarking around queue job inserting using BullMQ and got the following results:

  • Inserting 20k jobs in a single batch on the queue leads to 190mb RAM usage around 956ms;
  • Inserting 20k jobs individually on the queue leads to 13mb RAM usage around 430ms (This uses high CPU, but maybe it's better to have higher CPU than memory usage);

Why inserting in batch is more expensive than inserting one by one? I don't know, but it is.

My guesses about the memory usage being around the creation of a big recipients array in the JavaScript side were wrong, the problem was inserting to queue.

@diego3g
Copy link
Owner Author

diego3g commented Aug 13, 2021

Did some benchmarks with Bee Queue, but it didn't result in a better performance, actually it changed nothing.

@diego3g
Copy link
Owner Author

diego3g commented Aug 13, 2021

Removed two queries when receiving notifications from Amazon SNS.

Now we don't need to check for recipient existence nor create the events separatelly from upserting the recipient.

@diego3g
Copy link
Owner Author

diego3g commented Aug 13, 2021

Split the SNS webhook HTTP service on c810331.

@pellizzetti
Copy link

pellizzetti commented Aug 16, 2021

@diego3g

Inserting 20k jobs individually on the queue leads to 13mb RAM usage around 430ms (This uses high CPU, but maybe it's better to have higher CPU than memory usage);

Touchy subject, memory is usually considerably cheaper than vCPU[0]. Some numbers around the CPU usage in both tests would be nice!

It's also easier to manage/scale/allocate memory than CPU. Besides all that, applications in Node.js tend to consume more CPU than memory. Since we [currently] work almost exclusively with this technology, depending on the usage of each resource, it may be worth to sacrifice some RAM. (:

[0]

Item Price
vCPU $0.033174 / vCPU hour
Memory $0.004446 / GB hour

@diego3g
Copy link
Owner Author

diego3g commented Aug 16, 2021

Hey @pellizzetti, I tried making some CPU benchmarking but I couldn't manage to get the actual results of CPU consumption. Do you have any tips for me on that?

@pellizzetti
Copy link

@diego3g

If you created new, isolated scripts for your benchmarking, you won't need to handle startup/setup time, so, I guess a simple time would do it, e.g time node <script>. Just a quick reminder, it would be nice to run it some times to get the avg CPU usage of each script.

@pellizzetti
Copy link

If you want to take a deeper dive, take some time to look over perf/perf_events[0].

[0] https://www.brendangregg.com/perf.html

@Deividy
Copy link

Deividy commented Jan 25, 2022

thought this would be a fun quest, decided to jump in and I got your answer :)

Why inserting in batch is more expensive than inserting one by one? I don't know, but it is.

tldr;
because extraneous loops allocating different objects and a lot of strings (JSON.stringify()) before sending to redis; when inserting only one you don't allocate all these objects, and you can save some resources with fewer interactions.

here is the full story:

  • calling addBulk() in BullMQ triggers multiple loops in the whole data, initializing a lot of new Objects.
    testing on my Macbook with simple 20k objects ({ foo: 'bar' }), only this loop takes approximately 150ms~ of pure CPU

  • and it got worse in line #207

const result = (await multi.exec()) as [null | Error, string][];

this basically loops for all the 20k objects that we just added, checking some flags and building a scripts[] array

then it loops again all the 20k objects and sends one by one to redis

so, there is a bunch of loops in all data:

in the end, you have a bunch of copies of the original array; every .map() allocates a new array, BullMQ maintains the full queue, and ioredis also maintains the full queue with all objects.
that's why this addBulk() is memory expensive, and it's CPU expensive because it has a lot of loops in all data and every loop blocks the event loop and CPU.


performance issues are tricky; you might think that using time will get the correct time used by the process or that using the cpuUsage() will get the right answers, but most part of the time is not that simple, you will have other processes in the machine, other tasks running in your node application and so on.

to understand this kind of issue you must dig into the code, read and understand what folks are doing, debugger and console.log are your best friends here, external tools will only point the symptom, rarely will show you the issue. sometimes it's better to actually read the code and fully understand what's going on than benchmarking different modules, they might be using the same strategy.

a base ground rule when working with big data in javascript;

  • every loop blocks the CPU/Eventloop for N ms (depends on the complexity of loop and array size)
  • every .map/reduce/split/JSON.stringify/anything that allocates a new object() in this data will use more memory, sometimes it will double the usage or even go to 3x+, be careful and work reducing the iterations with this data.

✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants