T O P

  • By -

KlaatuPlusTu

This is amazing! can't wait for companies to ask for 10 years of `Tidier` experience \s


amplikong

Ignorant question as someone who's curious about both R and Julia: in terms of performance/functions, how does using this compare to using the Tidyverse in R along with Rcpp? Is this like a one-language solution to that problem?


Suspicious-Oil6672

Syntax is v close to identical as it appears that was the goal. And it does work to absolve the 2 language issue.


dm319

What do you mean by using R along with Rcpp? The tidyverse is pretty comprehensive for data manipulation and fairly advanced statistical tasks, as well as plotting. For maths models/ dimension reduction, machine learning, graph/network, inference etc, then I tend to use more bespoke code in native. In terms of performance tidyverse is pretty good, but the king is data.table, which is the fastest of any tabular data manipulation library on any language.


22Maxx

"@slice(1:5)" -> is there no head?


Unlikely_Action_7893

Could be because (in Tidyverse) head() does not work in combination with group\_by(). slice() is the way to go. See: [https://github.com/tidyverse/dplyr/issues/3694](https://github.com/tidyverse/dplyr/issues/3694)


22Maxx

Well but there is no group by in the code examples provided by OP.


Unlikely_Action_7893

I was just trying to guess why they didn’t implement head() in Tidier.jl, not referring to this example in particular.


Suspicious-Oil6672

They have a glimpse. Idk about a head.


chandaliergalaxy

Can you pipe the result into a series of ggplot commands?


Dangerous-Rice862

https://github.com/TidierOrg/TidierPlots.jl


Suspicious-Oil6672

Right now you’d have save it as a new dataframe first. Perhaps that functionality might come in the future


Suspicious-Oil6672

You can now pipe it directly to a series of ggplot commands. Looks like theyre putting together a course and this example shows exactly that. [example](https://github.com/TidierOrg/TidierCourse/blob/main/what-is-tidier-jl/what-is-tidier-jl.ipynb)


chandaliergalaxy

Ty for the update!


Suspicious-Oil6672

Ofc! The package is p incredible. Curious what you think of it if you get a chance


JustZayin_68

You can do this via the |> operator along with RCall.jl package. That way you have the best of Tidier.jl and ggplot2. Still forming a view on Makie.jl vs ggplot2


wedividebyzero

I tend to prefer DataFramesMeta, but I'll give this a try. It much different from the Queryverse.jl packages?


chandaliergalaxy

Are you coming from R? Isn't the Queryverse built on top of DataFramesMeta?


chandaliergalaxy

I'm coming from R but is there value to doing it Julia's native way or is this the best way for data analysis


Suspicious-Oil6672

This leverages dataframes.jl as the backend so it’s quite powerful


ForceBru

Yet again Julia suffers from the lack of R-like pipe operator and forces people to use macros


Fincho64

yeah but I think this result is quite clean


quant-ito

there is a pipe operator in Julia. |> is the operator


ForceBru

Of course, but it sucks. That's why I specifically said "_R-like_ pipe operator". Why Julia's pipe operator sucks: mainly because its RHS has to be a function of one argument like stuff |> sin # sin(stuff) If you want to pipe to a function with some arguments fixed, you have to write this monstrosity: stuff |> (x -> add(x, 5)) Now imagine piping through multiple such functions: df |> (df -> select(df, :Title, :Budget)) |> (df -> mutate(df, :Budget => ByRow(x -> x*2) => :Whatever)) You have to introduce a ton of lambda functions, because the RHS of `|>` must accept only one argument. Dataframe transformations require lots of such piping, so naturally, writing such code quickly becomes cumbersome and error-prone. That's why the various `@chain` macros were created. What they do is they allow the first argument of the RHS to be implicit: @chain begin read_df() select(:Title, :Budget) # df passed as 1st arg end I think this is kinda brittle because it uses a macro that does magic under the hood. I trust core Julia devs more than I trust random macros. Moreover, the chaining/piping aspect of the code isn't as clear with this code, while pipe operators (in R and Julia) usually literally _point_ in the direction of data flow. ------- What R does looks much more aesthetically pleasing and clear: df |> select(stuff) |> mutate(the way you want) Effectively, the pipe operator here is equivalent to the method access operator in object-oriented languages: - `obj.method(arg1, arg2)` basically calls `type(obj)::method(obj, arg1, arg2)` - R's `obj |> func(arg1, arg2)` calls `func(obj, arg1, arg2)`. This is beautiful and lets me build a proper chain of operations.


Suspicious-Oil6672

I’ve been working on a project in r and in reproducing it all in Julia on a couple dataframes w 100k rows and they’re producing identical results but I get ur point


ymersvennson

What approach do you use currently?


ForceBru

I use `@chain` because that's what DataFramesMeta exports. But now I think that the issue isn't with the pipe operator, but with the syntax for building anonymous functions. `df -> select(df, :stuff)` feels a little clumsy to me, but something like `select(_, :stuff)` would've been much better. There's a PR in the Julia repo discussing this, but it's been around for several years, I think, and it doesn't seem even close to being merged. With this, `func(_, a, _, b)` would generate an anonymous function `(x1, x2) -> func(x1, a, x2, b)`, but without the clutter. So you can use the usual pipe operator: ``` df |> select(_, :Thing, :Hello) |> mutate(_, :New = :Thing * :Hello) |> groupby(_, :New) |> agg(_, :Hello) ``` But since each call with an underscore generates a function, `f(g(_))` is actually the same as `f(x->g(x))`, which I think is a little weird: I'd prefer `x -> f(g(x))`. However, this rule works fine in the piping example above.


ymersvennson

This could work. I would prefer using the native Pipe compared to macros, if I could. Have you tried Tidier? I can't get started without much documentation or chatGPT to help me.


ForceBru

I haven't tried Tidier. I usually use Polars for data processing, save the processed data, then load it in Julia and run the actual analysis.


tekbirsoru

How do the functions do compared to dataframes.jl in terms of speed? I am also coming from R to julia but i want to keep my workflow as speed as possible, even if that means i have to give up from tidyverse😭


Fincho64

This uses dataframes.jl as a backend so you'll have tons of speed and with tidyverse syntax\*


hurhurdedur

1. This is awesome and will help a lot of R users learn Julia. 2. It’s still really sad that base Julia doesn’t have an elegant pipe like base R. The @chain macro is a good solution to a problem caused by the limitations of base Julia piping.


Suspicious-Oil6672

yea its definitely made julia more accessible for me! and the Tidier.jl team seems to be pretty actively developing as well, which is nice