Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Status #209

Closed
siavash-babaei opened this issue Nov 19, 2020 · 15 comments
Closed

Project Status #209

siavash-babaei opened this issue Nov 19, 2020 · 15 comments

Comments

@siavash-babaei
Copy link

siavash-babaei commented Nov 19, 2020

Hi,

With many thanx to the authors and maintainers for such a brilliant feature. It seems this TypeProvider has not really worked since R 3.5.

  • Any updates or is the project Dead?
  • Any realistic hope of a fix, in say, for example, 6 months?

Plenty Appreciated ...,
Cheerio

@hmansell
Copy link
Contributor

I wrote the original version but I am no longer actively developing for F#. @tpetricek might be still involved.

@dsyme
Copy link
Member

dsyme commented Nov 20, 2020

We need an active maintainer. Any volunteers?

@dsyme
Copy link
Member

dsyme commented Nov 20, 2020

(Ping me directly on my email if needed - I don't always see notifications here)

@siavash-babaei
Copy link
Author

siavash-babaei commented Nov 20, 2020

Dear @hmansell, since it is now part of FsLab, @tpetricek should be involved although I suppose FsLab itself needs revamping perhaps as a whole.

Incidently, a good comprehensive FsLab environment would certainly make F# more readily competitive with likes of Python and R. Even Julia has pulled ahead in data analytics in terms of capabilities in many senses which is a pity for F#, the language being very mathematical at core and brilliantly suitable for everything data.

  1. Maybe add a few other TypeProviders like SQLProvider, etc. support NoSQL big names as well.
  2. In terms of prototyping, .Net and F# are very verbose and use complex syntax when it comes to machine learning. This makes R interop and RProvider a most valuable tool.
  3. A simple regression model in R is specified by lm(Y ~ X1 + X2).
  4. In comparison, F# .Net code is still very clunky making prototyping cumbersome in this area.
  5. Compared with say, R ggplot, F# charting abilities are barely ok for exploratory analysis but far off quality in production code.
  6. Ports of Spark, Keras, Tensorflow, ML.NET in idiomatic F# would boost competitiveness 1000-fold. For the life of me, I don't get why everything available is so C#-ish with C# being so inherently very unsuitable for prototyping and data science pipelines.
  7. Other stuff like Deedle, FSharp.Data, Literate Coding, Jupyter Notebook
  8. Support, Math.NET, Accord.NET, etc. are just great, although I personally despise Accord syntax for prototyping.

I am not sure if or how much R and .Net APIs have changed but I doubt by much. As far as I have seen as a user, R Core has not changed much on the face of it for many years, and .Net 4.0 code can be consumed in .Net 5.0 with minimal changes. Hopefully, updating it shouldn't be a major rework ... just the thought is encouraging!!!

Our very dear BDFL @dsyme: I would offer my help except I don't have the experience of maintaining repos. In other areas perhaps once it takes off ...

@zyzhu
Copy link
Contributor

zyzhu commented Nov 20, 2020

@siavash-babaei, I'm glad to see your enthusiasm. Please take a look at an old issue in 2018 discussing about FsLab and data science using F# in general.
https://github.com/fslaborg/FsLab/issues/137

Lots of progress have happened since then, especially in the Jupyter notebook through dotnet/interactive kernel. I think interop with Python is in the pipeline according to some talks from Microsoft. I hope interop with R will come some day too. But that might be too big an ask from Microsoft team.

@zyzhu
Copy link
Contributor

zyzhu commented Nov 20, 2020

About linear regression. A pull was added to Deedle early this year to support some form of lm in R
fslaborg/Deedle#496

Take a look at some testing samples
https://github.com/fslaborg/Deedle/blob/master/tests/Deedle.Math.Tests/LinearRegression.fs

let actualCoeffs =
    LinearRegression.ols ["MSFT";"WMT"] "AES" true stockReturns
    |> LinearRegression.Fit.coefficients

@siavash-babaei
Copy link
Author

siavash-babaei commented Nov 21, 2020

Thanx @zyzhu. I doubt Microsoft would get involved in something like RProvider and I am not sure how simple interop with python would be helpful. I mean, for C/C++/Fortran, it makes sense to provide some simple interop so that you can switch and let that handle intensive bits of code, but python?! Not to mention that all the while RProvider was working just fine for a few years, no such TypeProvider for python really took off. Microsoft has already invested heavily in R gobbling up Revolution Analytics for a hefty price and rebranding it as Microsoft R distribution and adding the ability to directly script in R within SQL Server, before doing the same for python.

Now, through ML.NET, Math.NET, and Accord.NET, you get most of what you need from a Machine Learning perspective and they appear to be actively maintained. The problem with all, including the example included above by @zyzhu, is the awkwardness and verbosity.

Again, assuming that we have a data frame scores containing variables score, age, sex. In R, you would do:

model <- lm(data = scores, score ~ age * sex)

and then, from this model object, you can extract whatever you need, including statistics, coefficients and confidence intervals, error estimates, etc, even diagnostic plots, with some pretty intuitive names.

To me, doing the same thing as above and almost perfect in F# would go like:

    let model = 
        let data = scores
        let response = [ "score" ]
        let predictors = [ "age"; "sex" ]

        (data, response, predictors)
        |> linearModel ModelType.OLS CrossEffects.Multiplicative

with model object perhaps being a record type with fields corresponding to coefficients table, error estimates, basic statistics, etc.

@siavash-babaei
Copy link
Author

siavash-babaei commented Nov 21, 2020

Looking at it from a business perspective.
F# was a primarily Windows thing up to now. Even though open-source, it was not properly supported on Linux where a lot of open-source community resides. With .Net 5.0 and F# 5.0, things have changed and now .Net is properly multiplatform, although tooling in Linux I suppose could still go some way. So it is almost like a new start, with the opportunity to expand both the language and the userbase.
Something of noteworthy attention is the economic principle of competitive advantage. Basically how entities from nations to corporations to life itself stick to their strengths to survive and grow.
Say, Websharper or SAFE Stack: absolutely necessary for a modern language but have they really made a dent in penetrating current market share? I don't think even Typescript is making any significant headways in attractive Javascript users or new ones.
In my opinion, for whatever product, you would require a few killer features that would make it indispensable, and for F#, it could easily be the entire data analytics and data science workloads. The same thing that greatly helped propel python to the front. The user base, especially, being more mathematically inclined and comfortable with the syntax (I just love/adore it but dunno why but makes lots of people uncomfortable), ideas of immutability and the core of language being input -> function -> output, would be much better adopters than say, developers active in GUI or web. There are other areas I am sure, for example, business applications that fit nicely with Domain-Driven Design. But data science workloads - incidentally, a perfect match for DDD - are certainly worth the investment, especially as they seem to be exponentially growing both in volume and utilisation. If you think about it, one of the most active open source big data projects, Spark, is only 7 years old. The community seems to be more-so accepting of new tech that makes their life easier.

@dsyme
Copy link
Member

dsyme commented Nov 21, 2020

There are many questions being discussed here. Let's just deal with the question of FsLab and its pieces.

Here are my opinions:

  • The whole idea of a curated, unified collection like FsLab has turned out to be suspect as it doesn't really allow for change, evolution and deprecation unless very actively curated.

  • The curation stopped because FsLab as a collection was based on .NET Framework, and some parts of the collection suffered badly in the transition to .NET Core. It only took one part to be still stuck on Mono or .NET Framework to render the whole thing stuck. That's what happened.

  • With mono out of the loop things are easier once we establish a reasonable landing point

  • MSFT is active in the parts of FsLab we directly care about - XPlot, fsdocs (which now generate .NET Interactive notebooks), .net interactive, F# literate scripting. It is also active in many related technologies. Other companies also contribute

  • Reorienting to join forces with SciSharp, .net interactive and similar seems much more practical.

FsLab certainly needs to be taken down and/or revamped on .NET Core only and/or wound up as a "one-stop shop technology". That will create space for better approaches I think. I'm open to suggestions but we need to rethink things.

Note I'm not interested in discussing this from a "future of F#" perspective (this has nothing to do with F# and web programming, for example) but rather just practical steps to get things cleaned up on on a good sustainable coherent basis going forward

@siavash-babaei
Copy link
Author

I added some notes and thoughts that seemed more appropriate to FsLab as a whole in https://github.com/fslaborg/FsLab/issues/137. I hope they are helpful, certainly don't mean to be criticising or anything ....

@dsyme
Copy link
Member

dsyme commented Nov 21, 2020

Cool let's discuss in https://github.com/fslaborg/FsLab/issues/137

@siavash-babaei
Copy link
Author

Guidance for Newbies:

Suppose a person with decent working knowledge of both R and F# wants to kinda restart this RProvider project. So what steps should be taken and what should be learnt, before attempting to update/fix it so it works with say, the latest version Microsoft R Open as a stable LTS version. I checked an intro to type provider design on MSDN, examples didn’t make much sense regarding interop with a different language.

@siavash-babaei
Copy link
Author

siavash-babaei commented Aug 10, 2021

I am wondering what has fundamentally changed since R 3.4 that RProvider no longer works after that version. Is it an issue of updating used libraries and packages from .Net 4 to .Net 6 or something in R API has completely changed!!!
Given that R is an almost 40 year old language that has not changed much at core at least as far as users are concerned …

@hmansell
Copy link
Contributor

@siavash-babaei I was the original author but haven't kept up with developments in the .NET community the last few years. Hopefully the following will be helpful:

  • Re your question on type provider design: Think of APIs from the language you are interoperating with as being equivalent to tables/schemas in a database/dataset you are exposing. The two concepts are basically equivalent - you are dynamically constructing types that mirror the external resource - either a module in R or a data schema. The main difference is that exposing data usually results in properties to read the data (and some functions) and exposing APIs results in more functions (and some properties).
  • Re Why RProvider doesn't work, I haven't looked at this at all but the first thing to check is whether RDotNet works with the new versions. The RProvider is really just providing a typed layer on top of RDotNet, but most of the hard stuff (exposing R API's and native data types to .NET) is done inside RDotNet. RDotNet takes advantage of C#'s ability to map unmanaged memory into C# data types that can be used from C# or F#. If anything changes about how that works, or how memory management works, RDotNet will no longer work. Once RDotNet works, getting the RProvider to work is probably quite simple.

@AndrewIOM
Copy link
Collaborator

I'm going to close this issue, as we have just released the v2.0.0-beta nuget package. Hopefully this should address the issues raised in this thread. Also see #218 for discussion about project maintainance and contribution guidelines. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants