Applied Schema Caching Fix | Canidate V0.7.1 #121

SoundsSerious · 2023-02-08T05:20:01Z

use python __{class.__name__} prefix to store schema caches similar to the python double underscore, name mangling approach.
added recache option in schema to allow changing schema [ schema(...,recache:bool=False) ]

This PR should address these issues, while preserving the memory overflow fix of 0.7.0
#120
#115
#114

@kashifpk I did run these through the testing system and found that all tests passed. I think it might be worthwhile to check the relation's schema since I don't really use those in my application.

@overbost you may find this useful for your issues.

commit 2086167 Merge: d13f7eb 91e1a19 Author: Kashif Iftikhar <[email protected]> Date: Sat Nov 19 12:26:40 2022 +0500 Merge pull request threatify#117 from anthony-poddubny/feature/add_full_count_support Add full count support to the Query commit 91e1a19 Author: Anthony Poddubny <[email protected]> Date: Sat Nov 19 00:08:11 2022 +0200 add full count support commit d13f7eb Merge: 8de2c46 84acdd3 Author: Kashif Iftikhar <[email protected]> Date: Fri Jun 10 14:39:36 2022 +0500 Merge pull request threatify#110 from valentingregoire/patch-1 fix: Typo in README fixed. commit 84acdd3 Author: Valentin Grégoire <[email protected]> Date: Wed Jun 8 09:23:43 2022 +0200 fix: Typo in README fixed. commit 8de2c46 Merge: 641c2cf 202d4e7 Author: Kashif Iftikhar <[email protected]> Date: Thu Apr 21 10:45:50 2022 +0500 Merge pull request threatify#108 from SoundsSerious/fix_mem_schema Added Conditional Checking Of Marshmallow Fields

- use python `__{class}` double underscore behavior to prevent subclassing issues. - added recache option in schema to allow changing schema [ schema(...,recache:bool=False) ]

overbost · 2023-02-08T10:23:28Z

Thank you for the contribution.

I belive you tested the solution but i have some doubts about double underscore properties.
Check this samples i run in online-python.com

class A():
  @classmethod
  def schema(cls):
    if not hasattr(cls, "__cls_schema"):
        print("generating schema...")
        cls.__cls_schema = cls.__name__ + "-Schema"
        
print(A.schema())
---
output:

generating schema...
None
Traceback (most recent call last):
  File "main.py", line 13, in <module>
    print(A.__cls_schema)
AttributeError: type object 'A' has no attribute '__cls_schema'

Double score properties is not defined in class.
And there are another issue with class inheritation see here

This is the cause of the issue i reported about class inheritation

class A():
  @classmethod
  def schema(cls):
    if not hasattr(cls, "_cls_schema"):
        print("generating schema...")
        cls._cls_schema = cls.__name__ + "-Schema"
        
print(A.schema())

class B(A):
  pass
    
print(A._cls_schema)

print(B.schema())
print(B._cls_schema)

---
output:
generating schema...
ASchema
ASchema

Thank you for reporting me the solution where i can use CollectionBase, but doesn't fix the issue because _load method from collection is still used and is buggy.
I suggest to recycle the same properties names and simple check if schema name is referable to current class:

class A():
  @classmethod
  def schema(cls):
    if not hasattr(cls, "_cls_schema") or cls._cls_schema != cls.__name__ + "Schema":
        print("generating schema...")
        cls._cls_schema = cls.__name__ + "Schema"
        
A.schema()
A.schema()
class B(A):
  pass
    
print(A._cls_schema)

B.schema()
print(B._cls_schema)

--- 
output:
generating schema...
ASchema
generating schema...
BSchema

This solution allow schema cache, A.schema() is called twice but not generated twice.

simple and clean

SoundsSerious · 2023-02-08T19:25:32Z

@overbost You will see that I am using the same hasattr/setattr approach with cls.__name__, this is done by g() and f() lambda funtions.

The name mangling approach is purely to apply to the conventions of python.

It would be great if you could checkout this code and evaluate the code in this commit since there are additional factors for partial schema instantiation with the only keyword.

I have edited the text of the PR to reflect this

overbost · 2023-02-08T22:04:57Z

setattr is a nice workaroud, i have tested your PR and it works.
This solve my issue about class inheritance but as expected there are unused data inherited in derived class, check the code below

from arango_orm import Collection

class A(Collection):
    __collection__ = "A"
    
class B(A):
    __collection__ = "B"

a = A()

b = B._load(a._dump())

print(a.__collection__)
print(a)
print(a.__A__cls_schema)

print(b.__collection__)   
print(b)
print(b.__B__cls_schema)
print(b.schema())
print(b.__A__cls_schema)

---
output

 A
<A(_key=None)>
<class 'marshmallow.schema.ASchema'>
B
<B(_key=None)>
<class 'marshmallow.schema.BSchema'>
<BSchema(many=False)>
<class 'marshmallow.schema.ASchema'>   <-- __A__cls_schema shouldn't exist in B class

of course is not causing issues and the waste memory is not too much, but is technically incorrect.
Fetch a list of collections can be waste more memory.
My comment is only to provide contribution, i have no power here.

SoundsSerious · 2023-02-09T00:16:27Z

@overbost This is an interesting example. I have not observed this behavior, checking the __dict__ of each class.

I suspect this has to do with accessing any subclass property? Ie. any subclass can access its parent class variables?

How would the solution you're proposing handle this differently?

I don't think this affects memory at all since the variable from the parent class is defined in the parent class dict. Also good to remember that the point of this schema caching is reduce memory consumption. Every time schema() is called a new set of marshmallow objects is created creating a pretty significant bloat per object which affects CPU as well

overbost · 2023-02-09T08:24:00Z

If you check my first comment here there are the code example of the issue and my proposal of solution.
Is not a problem with _load/_dump process but caching logic in schema() method.
A.schema() is called before B class definition/creation, so B class inherit _cls_schema and _cls_schema_cache properties from A.
This properties are already setted in B, and B schema is not generated because the properties exists and contain A schema.
My solution is simply to reuse the same name properties, and simply check if name of marshmallow schema is cls.__name__ + "Schema" as reported in my first comment.
Last but not least i don't understand why you put a flag to force schema regeneration, since the schema should be constant at runtime, and not modified manually by the user. It must be modified at runtime only in this edge case of class inheritance, but is just a workaround.

I tryed the solution of settattr you use, but there aren't benefit in class inheritance

class A():
  @classmethod
  def schema(cls):
    if not hasattr(cls, "__cls_schema"):
        print("generating schema...")
        setattr(cls, "__cls_schema", cls.__name__ + "-Schema")
        
print(A.schema())

class B(A):
  pass
    
print(A.__cls_schema)

print(B.schema())
print(B.__cls_schema)

---
output

generating schema...
None
A-Schema
None <-- missed schema generation
A-Schema <-- It should be B-Schema

SoundsSerious · 2023-02-09T09:35:27Z

I think these are essentially the same solution. Yours has a dynamic flag, whereas mine has a dynamic variable. The authoritative schema comes from the schema() call so in the end, it's all the same.

I agree schema should be static at run time, but it's not a stated requirement in the library. The reason I added the regeneration ability was to address one potential issue causing #114, which caused a revert of 0.7.0

There were some issues with the entire concept of schema caching, so I am more concerned about those. I am also concerned about not having it, ie, memory leaks, as per issue #105.

I think it might take some time to figure this out but with @kashifpk's input and yours, I'm sure we can determine a solution :)

overbost · 2023-02-09T11:13:56Z

I appreciate your work, but i made a PR for my solution, included other changes you made in your PR.
#114 have the same problem with class inheritation.
I also added a simple bulk_delete.
Tests passed.

I apologize if i made some mistake, it's my first PR.

SoundsSerious · 2023-02-10T20:49:02Z

@volodymyrko We are discussing the schema caching issue as it relates to a number of issues including one opened by you #114

Its unclear the steps needed to reproduce this error, would you be able to test this PR or let us know how to setup to test for your issue?

volodymyrko · 2023-02-11T16:31:07Z

@SoundsSerious

this is the code from README and two asserts to reproduce the issue

from arango import ArangoClient
from arango_orm import Database

from arango_orm import Collection, Relation, Graph, GraphConnection
from arango_orm.fields import String, Date, Integer, Boolean


client = ArangoClient(hosts='http://localhost:8529')
test_db = client.db('main', username='test_user', password='qwerty')

db = Database(test_db)


class Student(Collection):

    __collection__ = 'students'

    _key = String(required=True)  # registration number
    name = String(required=True, allow_none=False)
    age = Integer()

    def __str__(self):
        return "<Student({})>".format(self.name)


class Teacher(Collection):

    __collection__ = 'teachers'

    _key = String(required=True)  # employee id
    name = String(required=True)

    def __str__(self):
        return "<Teacher({})>".format(self.name)


class Subject(Collection):

    __collection__ = 'subjects'

    _key = String(required=True)  # subject code
    name = String(required=True)
    credit_hours = Integer()
    has_labs = Boolean(missing=True)

    def __str__(self):
        return "<Subject({})>".format(self.name)




class SpecializesIn(Relation):

    __collection__ = 'specializes_in'

    expertise_level = String(required=True, options=["expert", "medium", "basic"])

    def __str__(self):
        return "<SpecializesIn(_key={}, expertise_level={}, _from={}, _to={})>".format(
            self._key, self.expertise_level, self._from, self._to)


class UniversityGraph(Graph):

    __graph__ = 'university_graph'

    graph_connections = [
        # Using general Relation class for relationship
        GraphConnection(Student, Relation("studies"), Subject),
        GraphConnection(Teacher, Relation("teaches"), Subject),

        # Using specific classes for vertex and edges
        GraphConnection(Teacher, SpecializesIn, Subject),
    ]

uni_graph = UniversityGraph(connection=db)
 db.create_graph(uni_graph)

# THESE ARE 2 asserts that reproduce the issue 
assert list(uni_graph.edges['studies'].schema().fields.keys()) == []
# the next line fails, because `[]` is already cached (this is actually the issue)
assert list(uni_graph.edges['specializes_in'].schema().fields.keys()) == ['expertise_level']

the order of assert is important ! if you swap them than the error will disappear (but still it is not correct behavior ).

volodymyrko · 2023-02-11T16:55:52Z

I've tested PR in terms of the issue I faced - everything seems to be fine.

SoundsSerious · 2023-02-15T05:44:47Z

@volodymyrko Awesome this is great! I think this shows that the class name-based lazy caching can handle edge cases like relationship instantiation.

@kashifpk any other concerns you think this PR might present?

kashifpk · 2023-02-15T08:16:29Z

@SoundsSerious @overbost @volodymyrko - Thank you for all the work on this. Let me go through this one and #122. I'll report back here. Just want to test out a few scenarios with both approaches.

SoundsSerious · 2023-02-28T21:56:54Z

@kashifpk would you like me to close out this PR since the other PR effectively supports the same features?

kashifpk · 2023-03-02T10:55:39Z

@SoundsSerious - sure, I couldn't find time to look into other caching PRs, I leave it to your best judgement.

SoundsSerious added 2 commits February 7, 2023 20:52

Applied Schema Caching Fix | V0.7.1

768ea83

- use python `__{class}` double underscore behavior to prevent subclassing issues. - added recache option in schema to allow changing schema [ schema(...,recache:bool=False) ]

SoundsSerious closed this Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applied Schema Caching Fix | Canidate V0.7.1 #121

Applied Schema Caching Fix | Canidate V0.7.1 #121

SoundsSerious commented Feb 8, 2023 •

edited

Loading

overbost commented Feb 8, 2023

SoundsSerious commented Feb 8, 2023 •

edited

Loading

overbost commented Feb 8, 2023

SoundsSerious commented Feb 9, 2023 •

edited

Loading

overbost commented Feb 9, 2023 •

edited

Loading

SoundsSerious commented Feb 9, 2023 •

edited

Loading

overbost commented Feb 9, 2023 •

edited

Loading

SoundsSerious commented Feb 10, 2023

volodymyrko commented Feb 11, 2023

volodymyrko commented Feb 11, 2023

SoundsSerious commented Feb 15, 2023

kashifpk commented Feb 15, 2023

SoundsSerious commented Feb 28, 2023

kashifpk commented Mar 2, 2023

Applied Schema Caching Fix | Canidate V0.7.1 #121

Applied Schema Caching Fix | Canidate V0.7.1 #121

Conversation

SoundsSerious commented Feb 8, 2023 • edited Loading

overbost commented Feb 8, 2023

SoundsSerious commented Feb 8, 2023 • edited Loading

overbost commented Feb 8, 2023

SoundsSerious commented Feb 9, 2023 • edited Loading

overbost commented Feb 9, 2023 • edited Loading

SoundsSerious commented Feb 9, 2023 • edited Loading

overbost commented Feb 9, 2023 • edited Loading

SoundsSerious commented Feb 10, 2023

volodymyrko commented Feb 11, 2023

volodymyrko commented Feb 11, 2023

SoundsSerious commented Feb 15, 2023

kashifpk commented Feb 15, 2023

SoundsSerious commented Feb 28, 2023

kashifpk commented Mar 2, 2023

SoundsSerious commented Feb 8, 2023 •

edited

Loading

SoundsSerious commented Feb 8, 2023 •

edited

Loading

SoundsSerious commented Feb 9, 2023 •

edited

Loading

overbost commented Feb 9, 2023 •

edited

Loading

SoundsSerious commented Feb 9, 2023 •

edited

Loading

overbost commented Feb 9, 2023 •

edited

Loading