-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard DB interfaces #61
Comments
peewee is a good lightweight ORM that may help. It abstracts the interaction with underlying db (sqlite, pgsql etc.) |
Albeit, we want to use no-sql databases to store json docs. coleifer/peewee#434 suggests that peewee doesnt support such. |
Yes, it is sql only. |
The sanskrit_util module makes use of sqlite and ORM for its data. Would switching over to use that fully be an acceptable solution? The data we are currently using on the sanskrit_util branch includes an old version of the INRIA data. I have filed an issue to update it to include the latest INRIA data. Given that it is SQL, will that work, or do we continue to think about nosql solutions? |
If you're using SQL and intend to continue to use SQL (which I think is a bad choice given that nosql dbs are a simpler design than sql + orm) - yes. Otherwise, no. |
(just my 2 cents :)) I'd prefer sqllite for following reasons:
|
I should point out that the following makes zero difference in the sql-nosql choice, and should be disregarded:
And the following actually favors nosql:
From my ancient sql experience, I did not understand what was meant with "Data may be versioned, sql has well established servicing patterns" .. A good choice would have to consider factors mentioned in https://www.couchbase.com/resources/why-nosql - mainly flexibility and non-ugliness (ie why one finds json naturally more intuitive than data produced by a few joins). |
Initially I thought about a dependency on mongodb or the likes. Installed size comes to
In my opinion, these tradeoffs are probably one way to make a sql vs nosql choice. Have used both in production in different cases (the recent one is a sql paas that stores TBs). The ugliness factor is limited to only database layer, often the customer doesn't see it :) They do notice the performance and reliability. Downside of sql: ACID provides guarantees, costs performance. Arstechnica does cover this well in first part of this article.
Both SQL and NoSQL cases may use ORM. This is probably good design, rather than a pro/cons for any db technology choice. It's just simple layered architecture where the database concepts are limited to one layer, all the other components deal with plain old objects. Objects stay decoupled and forever.
Sorry if it appeared so, my intention isn't that. I am only attempting to evaluate quantitatively based on a possible use case. As I mentioned earlier, having used nosql, I am not against it at all :)
We face an interesting problem in production with Coming back to our use case. Are these assumptions correct? How do they map to the future vision of this library?
|
Well, we just have to focus on the main thing matters to us - programmers who are contributors and users of this package. As you rightly said: "Data storage format: json, binary, etc.. This is how the technology stores the data, not how we show it to the user or another library." Speed, ACID and such concerns fade away into the background in comparison. sanskrit_parser will interact with the rest of the world mostly through json (or something like that, - say a protocol buffer). This is simplest if it used, produced and consumed json natively. We should not have to spend our time mucking around with sql (it is the library's headache how it stores this stuff internally) : we should be able to say "give me details of this pada or dhAtu or sentence" and get such detail in the most convenient (json-like) form, which can then be mechanically deserialized into python objects (using jsonpickle or a wrapper thereof, which I suppose is subsumed by "ORM"?). You rightly say: "The ugliness factor is limited to only database layer, often the customer doesn't see it", to which I say - even we shouldn't need to deal with it. My experience is that sanskrit data gets mutated a lot, and flexibility is important - with json this becomes as simple as adding or moving a sub-object or a field; while in sql, you'd add a new table, define a join and then write a module to make a json object out of it. I've looked at versioning in the context of mongodb, which does not natively support versions - the solutions seemed quite simple (basically have a seprate version db). In our case, I think that any release of sanskrit_parser code will expect a certain specific version of some data - else it will prompt an upgrade - no real need to go back and forth on versioned data. (This has been the same in case of https://github.com/sanskrit-coders/stardict-sanskrit/ as well) |
@vvasuki : Are you talking of the API mode, or do you think it's better to have a JSON wrapper for programmatic python access as well?
Data will be dhatu related (dhatupAtha etc.) and form related (INRIA, sanskrit_utils, the neural net L0 that we've plans for ...)
Can't see how at the moment
Yes - I could see multiple sources being used. We already use dhAtupAtha for some dhatu information, and another db for forms of the same dhAtu.
Take a look at the current ~/.sanskrit_parser/data directory, I expect filesizes to stay similar.
My expectation would be reliability, followed by performance. @vvasuki has more experience in the interactions between various such projects |
|
Quoth @vvasuki in #9
"
Taking the flexibility preference to a slightly higher level - it is a good idea not to be "married" to any database technology. Access it via an interface (such as DbInterface and ClientInterface here - PS: you don't have to implement every method). Switching to a different database tool should be as simple as calling a different class's constructor - one shouldn't have to go messing about anywhere else."
The text was updated successfully, but these errors were encountered: