Skip to content

Latest commit

 

History

History
144 lines (117 loc) · 5.61 KB

meeting_00011_20201006.md

File metadata and controls

144 lines (117 loc) · 5.61 KB

Rust ML WG Meeting 00011

hackmd-github-sync-badge

Meeting Info

Date: 20201006

Start time: 1600 ET

Zoom: https://ucsb.zoom.us/j/6601852842

Agenda

Participants

  • chrism
  • ricky
  • andraz
  • manuel

Minutes

Transfer ZuseZ4/datasets to Rust-ML-WG

  • Datasets repository to Rust ML Group Repository
  • manuel is interested in transfering that over Zuse24 dataset repository
  • talking about having a list of datasets or places to go for Rust data loaders for ML set in the group
  • chrism mentioned that it would be nice to have multiple loaders for the datasets so it is widely availble to people
    • if it is a specific loader for something we might want to just link to it

Rust ML Optimisation

  • jamal is looking into optimisations with FPGAs and GPUs
    • for a lot of FPGAs with open source we're not there yet
    • cool project called symbiflow
      • https://symbiflow.github.io/
      • open source project that is taking that process and make it simplier
        • standarizing how to deploy to an FPGA
        • but that's a ways off
  • took a step back and looking at CPU optimizations
    • RISCV
    • wants to start here, anything beyond FPGAs for this is a no-starter since the environment is too new
  • here he was looking at arrayfire and ndarray a framework for neural networks
    • as they relate to embedded hardware

SmartCore

  • chrism looked at the documentation for the code
    • docs are really good
    • mostly classical ML algorithms
      • bunch of different regressions
      • decision trees and random forest
      • nearest neighbor
    • linfa has one more clustering algorithm
      • SmartCore might only have k-means
  • Have not looked into their model evaluation
  • it is on arewelearningyet.com!

arewelearningyet.com updates

  • updated info
  • direct links to linfa and smartcore
    • calling them out specifically as meta-repos for doing ML in Rust
  • we have green on the board!
    • one category has been updated from red -> yellow
  • we need to have a sort order to the libraries on arewelearningyet.com
    • talk with anthony on how to get this done
    • best sorting order
      • maybe a function of crates.io downloads and last commit?
  • datasets on arewelearningyet.com
    • another category for those
    • right now there's not enough repositories for adding another category
  • linfa datasets
  • Talking about how crates.io blocked a publish crate with a github dependency
    • just chatting about the technical aspects around loading a dataset with a crate

Open Source Licensing in Machine Learning

  • looking into licensing, ethics, privacy, and everythign around ML
  • reached out to the group of responsible AI licensesing
    • not helpful responses from them
  • nearing the end of the first draft of the blog post on licensing around ML
    • overview, in-the-weeds, and a good use case of licenses
    • description of why he feels using Apache2.0 and MIT is not sufficient for the libraries
      • the distinct lack of responsibility of the technology that we're not comfortable with in publishing a ML library
  • talked to a lawyer family member
    • IP lawyer is needed, they said it's magical and very specific
  • considering writing a first version of a license for open source ML libraries
    • however writing your own license is very very difficult
  • if anyone is interested to help review, reach out
    • he will find a way to send it so we can help out

Online Learning Machinery in rust

  • high veloicty machine learning
  • written implemetnation of regession
    • online learning
      • most well known tool is Vowpal Wabbit
        • written in C++
        • heavily specalized and very fast
      • however, if you narrowdown the problem and use Rust
        • you can make it 3x faster ;)
  • they wrote an internal tool to make this very fast
    • trying to open source it, licensing right now though and it's moving along
  • why it is interesting
    • this is written in Rust that others don't have
    • probably the fastest linear regression on normal CPUs
  • used internal for data science research
    • hyperparameter tuning
    • maxing out at memory bandwidth it's that fast
  • going to move from internal repository to github
    • so they are forced to maintain it there
  • optmized a lot
    • like instead of gzip using lz4
  • downside is it's very narrow in what it takes and gives

Rust ML Meetup November 27th

Actions

  • chrism: talk to anthony about sort order for arewelearningyet.com
  • ricky: Possibly give some time to Andraz during Rust ML meetup talk to talk about the online learning tool they are open sourcing