Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Hints on best practices / what to avoid #369

Open
Penaz91 opened this issue Aug 24, 2022 · 4 comments
Open

[Question] Hints on best practices / what to avoid #369

Penaz91 opened this issue Aug 24, 2022 · 4 comments

Comments

@Penaz91
Copy link

Penaz91 commented Aug 24, 2022

Greetings,

I have been a user of Pylint-Django for about a year and I really appreciate how this tool gives me a hand when it comes to code quality.

I wanted to ask if it would be possible to implement some suggestions on best practices and how to avoid some common pitfalls that may come from being in a rush or just plain carelessness (I am guilty of both).

For instance:

  • Creating/updating models inside a for loop could have bad performance, instead bulk_create/bulk_update could work better;
  • Querying models inside a for loop should be avoided, a single query using the __in= operator could be better performing.

This are a couple examples on top of my head that gave me a bit of a headache when debugging performance issues on a software I was coding. Having a tool remind me about these possible performance hogs could have saved me a lot of effort.

Thank you for your attention.

@atodorov
Copy link
Contributor

At the top of my head I think these are definitely doable from a technical POV. However I think most if the maintainers and contributors are quite busy ATM to be able to devote significant time to new features. Although the patterns seem relatively simple implementation and managing tests & corner cases (in order to actually have confidence that it works) become non-trivial.

However you may find the following resources helpful and give it a try yourself:

From what I can see your pattern is:

For 1) for loop statement, then calling .save() inside of it - both easy to match. Bonus points for checking if the variable we call .save() on inherits from Model.

For 2) For loop + QuerySet - that's a bit tricky b/c there are many methods which return a QuerySet and/or it could be coming in outside the loop as a variable, result of a function, etc. A hack could be to search for .filter indiscriminately inside the loop body.

@Penaz91
Copy link
Author

Penaz91 commented Aug 25, 2022

Thanks for your response, no worries about being busy: it's understandable.

I will keep these suggestions in mind for a rainy day, It could be fun to delve into ASTs and write something others may be able to use.

@carlio
Copy link
Member

carlio commented Aug 25, 2022

For future reference, I'll add that the "is it a queryset" detection is a bit fuzzy - https://github.com/PyCQA/pylint-django/blob/master/pylint_django/augmentations/__init__.py#L550 - because as @atodorov says, it's not super easy to know for sure that it is a Queryset.

The AST parsing is all done by astroid so the docs there would help too understanding the Node class and similar.

Conceptually, "am I in a loop, am I calling .get or .create a lot on something that looks like a queryset" is very doable.

@Penaz91
Copy link
Author

Penaz91 commented Sep 1, 2022

I'm slowly studying how PyLint checkers are written and in the meantime I got another couple of ideas while working at my day job:

  • Using queryset.all() in a for loop may be slower than using queryset.iterator()
  • Using queryset.all() in a list comprehension may be slower than using queryset.iterator()

Hopefully I'll be able to integrate Pylint with a "git version" of Pylint-Django without too many headaches.

Thank you all for the resources and helpful tips!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants