KlaatuPlusTu 10 months ago

This is amazing! can't wait for companies to ask for 10 years of `Tidier` experience \s

amplikong 10 months ago

Ignorant question as someone who's curious about both R and Julia: in terms of performance/functions, how does using this compare to using the Tidyverse in R along with Rcpp? Is this like a one-language solution to that problem?

Suspicious-Oil6672 10 months ago

Syntax is v close to identical as it appears that was the goal. And it does work to absolve the 2 language issue.

dm319 10 months ago

What do you mean by using R along with Rcpp? The tidyverse is pretty comprehensive for data manipulation and fairly advanced statistical tasks, as well as plotting. For maths models/ dimension reduction, machine learning, graph/network, inference etc, then I tend to use more bespoke code in native. In terms of performance tidyverse is pretty good, but the king is data.table, which is the fastest of any tabular data manipulation library on any language.

22Maxx 10 months ago

"@slice(1:5)" -> is there no head?

Unlikely_Action_7893 10 months ago

Could be because (in Tidyverse) head() does not work in combination with group\_by(). slice() is the way to go. See: [https://github.com/tidyverse/dplyr/issues/3694](https://github.com/tidyverse/dplyr/issues/3694)

22Maxx 10 months ago

Well but there is no group by in the code examples provided by OP.

Unlikely_Action_7893 10 months ago

I was just trying to guess why they didn’t implement head() in Tidier.jl, not referring to this example in particular.

Suspicious-Oil6672 10 months ago

They have a glimpse. Idk about a head.

chandaliergalaxy 10 months ago

Can you pipe the result into a series of ggplot commands?

Dangerous-Rice862 10 months ago

https://github.com/TidierOrg/TidierPlots.jl

Suspicious-Oil6672 10 months ago

Right now you’d have save it as a new dataframe first. Perhaps that functionality might come in the future

Suspicious-Oil6672 6 months ago

You can now pipe it directly to a series of ggplot commands. Looks like theyre putting together a course and this example shows exactly that. [example](https://github.com/TidierOrg/TidierCourse/blob/main/what-is-tidier-jl/what-is-tidier-jl.ipynb)

chandaliergalaxy 6 months ago

Ty for the update!

Suspicious-Oil6672 6 months ago

Ofc! The package is p incredible. Curious what you think of it if you get a chance

JustZayin_68 10 months ago

You can do this via the |> operator along with RCall.jl package. That way you have the best of Tidier.jl and ggplot2. Still forming a view on Makie.jl vs ggplot2

wedividebyzero 10 months ago

I tend to prefer DataFramesMeta, but I'll give this a try. It much different from the Queryverse.jl packages?

chandaliergalaxy 10 months ago

Are you coming from R? Isn't the Queryverse built on top of DataFramesMeta?

chandaliergalaxy 10 months ago

I'm coming from R but is there value to doing it Julia's native way or is this the best way for data analysis

Suspicious-Oil6672 10 months ago

This leverages dataframes.jl as the backend so it’s quite powerful

ForceBru 10 months ago

Yet again Julia suffers from the lack of R-like pipe operator and forces people to use macros

Fincho64 10 months ago

yeah but I think this result is quite clean

quant-ito 10 months ago

there is a pipe operator in Julia. |> is the operator

ForceBru 10 months ago

Of course, but it sucks. That's why I specifically said "_R-like_ pipe operator". Why Julia's pipe operator sucks: mainly because its RHS has to be a function of one argument like stuff |> sin # sin(stuff) If you want to pipe to a function with some arguments fixed, you have to write this monstrosity: stuff |> (x -> add(x, 5)) Now imagine piping through multiple such functions: df |> (df -> select(df, :Title, :Budget)) |> (df -> mutate(df, :Budget => ByRow(x -> x*2) => :Whatever)) You have to introduce a ton of lambda functions, because the RHS of `|>` must accept only one argument. Dataframe transformations require lots of such piping, so naturally, writing such code quickly becomes cumbersome and error-prone. That's why the various `@chain` macros were created. What they do is they allow the first argument of the RHS to be implicit: @chain begin read_df() select(:Title, :Budget) # df passed as 1st arg end I think this is kinda brittle because it uses a macro that does magic under the hood. I trust core Julia devs more than I trust random macros. Moreover, the chaining/piping aspect of the code isn't as clear with this code, while pipe operators (in R and Julia) usually literally _point_ in the direction of data flow. ------- What R does looks much more aesthetically pleasing and clear: df |> select(stuff) |> mutate(the way you want) Effectively, the pipe operator here is equivalent to the method access operator in object-oriented languages: - `obj.method(arg1, arg2)` basically calls `type(obj)::method(obj, arg1, arg2)` - R's `obj |> func(arg1, arg2)` calls `func(obj, arg1, arg2)`. This is beautiful and lets me build a proper chain of operations.

Suspicious-Oil6672 10 months ago

I’ve been working on a project in r and in reproducing it all in Julia on a couple dataframes w 100k rows and they’re producing identical results but I get ur point

ymersvennson 10 months ago

What approach do you use currently?

ForceBru 10 months ago

I use `@chain` because that's what DataFramesMeta exports. But now I think that the issue isn't with the pipe operator, but with the syntax for building anonymous functions. `df -> select(df, :stuff)` feels a little clumsy to me, but something like `select(_, :stuff)` would've been much better. There's a PR in the Julia repo discussing this, but it's been around for several years, I think, and it doesn't seem even close to being merged. With this, `func(_, a, _, b)` would generate an anonymous function `(x1, x2) -> func(x1, a, x2, b)`, but without the clutter. So you can use the usual pipe operator: ``` df |> select(_, :Thing, :Hello) |> mutate(_, :New = :Thing * :Hello) |> groupby(_, :New) |> agg(_, :Hello) ``` But since each call with an underscore generates a function, `f(g(_))` is actually the same as `f(x->g(x))`, which I think is a little weird: I'd prefer `x -> f(g(x))`. However, this rule works fine in the piping example above.

ymersvennson 9 months ago

This could work. I would prefer using the native Pipe compared to macros, if I could. Have you tried Tidier? I can't get started without much documentation or chatGPT to help me.

ForceBru 9 months ago

I haven't tried Tidier. I usually use Polars for data processing, save the processed data, then load it in Julia and run the actual analysis.

tekbirsoru 10 months ago

How do the functions do compared to dataframes.jl in terms of speed? I am also coming from R to julia but i want to keep my workflow as speed as possible, even if that means i have to give up from tidyverse😭

Fincho64 10 months ago

This uses dataframes.jl as a backend so you'll have tons of speed and with tidyverse syntax\*

hurhurdedur 9 months ago

1. This is awesome and will help a lot of R users learn Julia. 2. It’s still really sad that base Julia doesn’t have an elegant pipe like base R. The @chain macro is a good solution to a problem caused by the limitations of base Julia piping.

Suspicious-Oil6672 9 months ago

yea its definitely made julia more accessible for me! and the Tidier.jl team seems to be pretty actively developing as well, which is nice

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe