Clojask: Inviting Data Scientists to Distributed Computing

There are no files associated with this item.

Citations:
Appears in Collections:
- Faculty of Business & Economics: Conference papers

Title	Clojask: Inviting Data Scientists to Distributed Computing
Authors	Buehlmaier, M Liu, Y
Issue Date	2022
Citation	reClojure Conference (Virtual), December 3, 2022 How to Cite?
Abstract	Clojask is a distributed dataframe with a focus on usability and scalability. On one hand, Clojask is simple to use so that data scientists without any distributed systems experience can use Clojask immediately. The API design is inspired by R's data.table and SQL, so the learning curve is flat. On the other hand, Clojask is optimized for larger-than-memory datasets. Memory overflow will not be a problem even for tasks with massive datasets. Both technical considerations are determined to attract and benefit users with prior data science experience to Clojure. In our session, we would like to cover topics such as a functionality walkthrough (with reference to R data.table and SQL), comparisons with Dask (in Python) and Spark as well as what Clojask can bring to the Clojure data science community.
Persistent Identifier	http://hdl.handle.net/10722/323235

DC Field	Value	Language
dc.contributor.author	Buehlmaier, M	-
dc.contributor.author	Liu, Y	-
dc.date.accessioned	2022-12-02T14:06:14Z	-
dc.date.available	2022-12-02T14:06:14Z	-
dc.date.issued	2022	-
dc.identifier.citation	reClojure Conference (Virtual), December 3, 2022	-
dc.identifier.uri	http://hdl.handle.net/10722/323235	-
dc.description.abstract	Clojask is a distributed dataframe with a focus on usability and scalability. On one hand, Clojask is simple to use so that data scientists without any distributed systems experience can use Clojask immediately. The API design is inspired by R's data.table and SQL, so the learning curve is flat. On the other hand, Clojask is optimized for larger-than-memory datasets. Memory overflow will not be a problem even for tasks with massive datasets. Both technical considerations are determined to attract and benefit users with prior data science experience to Clojure. In our session, we would like to cover topics such as a functionality walkthrough (with reference to R data.table and SQL), comparisons with Dask (in Python) and Spark as well as what Clojask can bring to the Clojure data science community.	-
dc.language	eng	-
dc.title	Clojask: Inviting Data Scientists to Distributed Computing	-
dc.type	Conference_Paper	-
dc.identifier.email	Buehlmaier, M: buehl@hku.hk	-
dc.identifier.authority	Buehlmaier, M=rp01305	-
dc.identifier.hkuros	342731	-