-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coursera Scala Course's Capstone Uses Your Library, but it may not work in that condition. #48
Comments
Hi @codeaperature thank you for opening the issue! To use our encoders, all you need is I can adapt your stackoverflow snippets as follows: import scala3encoders.given
import org.apache.spark.sql.Encoder
case class StationX(stnId: Int, wbanId: Int, lat: Double, lon: Double)
object Station extends App:
val ss = summon[Encoder[StationX]]
println(ss.schema) and package observatory
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Encoder
import scala.reflect.ClassTag
import scala.deriving.Mirror
import scala3udf.{Udf => udf}
import scala3encoders.given
case class CC(i: Int)
object SparkInstance extends App {
val spark = SparkSession
.builder()
.appName("Spark SQL UDF scalar example")
.getOrCreate()
def getSchema[T: Mirror.ProductOf: ClassTag] = summon[Encoder[T]].schema
val random = udf(() => Math.random())
val plusOne = udf((x: Int) => x + 1)
val ss = getSchema[CC]
} You should not need to write a function such as |
I'm a little flustered and worried that an actual course uses spark together with Scala 3 - I would consider this combination experimental and not suited for beginners (although Scala 3 IMHO is much better than Scala 2). |
@michael72 IIRC the course is offered in both Scala 2 and Scala 3. But it has been out for a while, maybe the course manager should investigate whether the scala 3 version has caused more problems... |
I finally got back to this (I have a regular Data Eng job too) ... I do not believe the parameters of the project mean I can add in extra libraries and it seems that this part does not work in the project: .../observatory/src/main/scala/observatory/SparkInstance.scala:8:8 Maybe I made some other changes. BTW - Did you download the project or just check this in another way? Since there is no requirement to use Spark and the assignment actually uses a jarred resource ... and per the course suggestion: the data needs to be stream-loaded into memory and then pushed into a spark dataframe/dataset to be processed. I think it's just unnecessary overhead in terms of memory, code and socket open/close time,... I can simply use parallel collections to do a simple join. I'm going to drop this issue as I am taking a different path, but I am still curious if Coursera provided a bunk suggestion to use your library without supplying the proper tooling in the build.sbt. Thanks for your past attention to look into this item. |
I think I understand better the issue now. The assignment does not involve udfs, @michael72 implemented the udf a long time after the release of the course. I will also ask if other people reported this issue. I am sorry for the frustration this has caused you. |
Yeah - I tried to do some things differently ... for example a UDF to convert deg C -> F, but this could be done in another way. Also, I wanted to use datasets with StructTypes automatically derived from case classes. Thanks for looking into this item for me. |
Hi Vincenzo,
To me, it's unclear how to use your library and it's possible that Coursera Scala Course's Capstone (in the build file) has pointed to information that's not longer valid in the readme. I posted this to stackoverflow. This course is hard without being able to do the simple things - it would be nice if you updated your README markdown to help work this issue of TypeTags out. You can note that I tried to make the code on the stackoverflow match Spark's advice, but I also tried to follow the markdown, but didn't post that. In the coursera project, I don't think we can change the build file.
Stefan
The text was updated successfully, but these errors were encountered: