-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Licensing Hanzi + Data sets #18
Comments
Hello @tony, thanks so much for bringing this up. The license was long overdue. I've updated the readme and added an MIT license. For the data it gets a bit more trickier. You can see I listed all of the ones that were not created by me or generated by HanziJS. Most of the data will be okay with commercial use, as their licenses allow it, if you want to use it in that way, however, it depends on how you use it. For example, I contacted the creators of the Leiden University Word Frequency Corpus and they seemed happy to let me use it in HanziCraft.com as long I don't sell the actual data. What are your building? I'm very curious! Would love add it to the projects that use HanziJS list. |
@nieldlr I'm happy to meet you. Hanzi looks great. I think it would be appropriate that LICENSES of the data set be in documentation. Since Hanzi is a distribution, the license of the dataset(s) should be official. I kind of want to act as a missionary right now: ODC / Open Data Commons Attribution License (ODC-By) v1.0 - http://opendatacommons.org/licenses/by/summary/ - http://opendatacommons.org/licenses/by/1.0/ - Simple, guarantees attribution.
I am following the idea of http://okfn.org/opendata/. IANAL, but from what I understand - this would technically allow the data to be sold wholesale.. I understand where the owner of a dataset would have concern. However, in real life, we could just burn FreeBSD CD's and hang out on street corners all day like Jay and Silent Bob. I think that moving datasets to Open Data Commons format seems nice, since ultimately, it's about what the implementer does with it. In practice, it protects the real interests of the data provider. Giving attribution to the original provider, and assuring the public data can be used by the world. I am in the process of a similar project for python. I am wrangling together cjk datasets and trying to get them under MIT/ODC licenses. cburgmer/cjklib#6. When it's the right time, I will chime in with what I'm working on. Would you be interested in some sort of a collaboration to cover the rest useful hanzi data sets? I can help fill in / PR datasets you are missing, maybe you / we can contact the data providers and see if they can release theirs under ODC? Edit: I noticed the LICENSE in the README linking to http://lwc.daanvanesch.nl/legal.php, http://lingua.mtsu.edu/chinese-computing/copyright.html. I think for these two providers, it'd be preferable to see them under ODC/CC-type license. BTW, The chinese character decomposition by Gavin is ODC now. |
Hey @tony, this is great! Thanks so much for spearheading this. I've had a similar idea to open source data collections for learning materials for learners and developers in the past. I really like the open data commons. First time I've seen it! I'll release my data under ODC. In fact, I'll clean up the directory a bit to clear out data not being used at present/anymore. I'll then state which ones are mine, released under ODC and then list the other datasets as well. It's great to see that Gavin has released his under ODC. I'll contact the other two and update you. Just one question, I had a look at the licenses, I really like this one a bit more: http://opendatacommons.org/licenses/odbl/summary/ Thanks again for this. I really like these ideas! |
@nieldlr: I am ok with ODBL. In practice, assurances granted in writing are common sense, but in open source, permissiveness tends to be the best. I go into it here ScottDuckworth/python-anyvcs#32 (comment). GPL is a hardcore example of compliance. My point I go into is, in practice, it's in a self-interested best interest to PR / patch upstream. It takes additional energy and time to maintain a fork. At worst, there is someone with a DRM'd version of a library who's wasting his time keeping up the sync with the main dataset. The common moocher isn't the type who is going to be holding back a useful patch anyway. If there's an off shoot chance a genius grabs the data set and creates a better derivative, good for them. It may be in their best interest to patch back to the original for 1.) glory 2.) avoid having to synchronize the diff. They still are required to give attribution. I'm fine with ODBL and ODC. I prefer ODC. |
@nieldlr: https://github.com/tony/cihai/tree/master/cihai/datasets/unihan that is worth keeping an eye on, I have a README there that outlines some standards I found for the cjkdata. I will probably separate it into a different repo soon. Edit: Now https://github.com/cihai, https://github.com/cihai/cihai-handbook, https://github.com/cihai/cihaidata-python. |
Greetings @nieldlr !
What is the license with Hanzi? I prefer MIT, BSD and Apache.
Separately, what is the copyright, licensing and source of the data sets @ https://github.com/nieldlr/Hanzi/tree/master/lib/data?
The text was updated successfully, but these errors were encountered: