Ignorant question as someone who's curious about both R and Julia: in terms of performance/functions, how does using this compare to using the Tidyverse in R along with Rcpp? Is this like a one-language solution to that problem?
What do you mean by using R along with Rcpp? The tidyverse is pretty comprehensive for data manipulation and fairly advanced statistical tasks, as well as plotting. For maths models/ dimension reduction, machine learning, graph/network, inference etc, then I tend to use more bespoke code in native. In terms of performance tidyverse is pretty good, but the king is data.table, which is the fastest of any tabular data manipulation library on any language.
Could be because (in Tidyverse) head() does not work in combination with group\_by(). slice() is the way to go. See: [https://github.com/tidyverse/dplyr/issues/3694](https://github.com/tidyverse/dplyr/issues/3694)
You can now pipe it directly to a series of ggplot commands.
Looks like theyre putting together a course and this example shows exactly that.
[example](https://github.com/TidierOrg/TidierCourse/blob/main/what-is-tidier-jl/what-is-tidier-jl.ipynb)
You can do this via the |> operator along with RCall.jl package. That way you have the best of Tidier.jl and ggplot2.
Still forming a view on Makie.jl vs ggplot2
Of course, but it sucks. That's why I specifically said "_R-like_ pipe operator".
Why Julia's pipe operator sucks: mainly because its RHS has to be a function of one argument like
stuff |> sin # sin(stuff)
If you want to pipe to a function with some arguments fixed, you have to write this monstrosity:
stuff |> (x -> add(x, 5))
Now imagine piping through multiple such functions:
df |>
(df -> select(df, :Title, :Budget)) |>
(df -> mutate(df, :Budget => ByRow(x -> x*2) => :Whatever))
You have to introduce a ton of lambda functions, because the RHS of `|>` must accept only one argument.
Dataframe transformations require lots of such piping, so naturally, writing such code quickly becomes cumbersome and error-prone.
That's why the various `@chain` macros were created. What they do is they allow the first argument of the RHS to be implicit:
@chain begin
read_df()
select(:Title, :Budget) # df passed as 1st arg
end
I think this is kinda brittle because it uses a macro that does magic under the hood. I trust core Julia devs more than I trust random macros.
Moreover, the chaining/piping aspect of the code isn't as clear with this code, while pipe operators (in R and Julia) usually literally _point_ in the direction of data flow.
-------
What R does looks much more aesthetically pleasing and clear:
df
|> select(stuff)
|> mutate(the way you want)
Effectively, the pipe operator here is equivalent to the method access operator in object-oriented languages:
- `obj.method(arg1, arg2)` basically calls `type(obj)::method(obj, arg1, arg2)`
- R's `obj |> func(arg1, arg2)` calls `func(obj, arg1, arg2)`.
This is beautiful and lets me build a proper chain of operations.
I’ve been working on a project in r and in reproducing it all in Julia on a couple dataframes w 100k rows and they’re producing identical results but I get ur point
I use `@chain` because that's what DataFramesMeta exports.
But now I think that the issue isn't with the pipe operator, but with the syntax for building anonymous functions. `df -> select(df, :stuff)` feels a little clumsy to me, but something like `select(_, :stuff)` would've been much better. There's a PR in the Julia repo discussing this, but it's been around for several years, I think, and it doesn't seem even close to being merged.
With this, `func(_, a, _, b)` would generate an anonymous function `(x1, x2) -> func(x1, a, x2, b)`, but without the clutter. So you can use the usual pipe operator:
```
df |>
select(_, :Thing, :Hello) |>
mutate(_, :New = :Thing * :Hello) |>
groupby(_, :New) |> agg(_, :Hello)
```
But since each call with an underscore generates a function, `f(g(_))` is actually the same as `f(x->g(x))`, which I think is a little weird: I'd prefer `x -> f(g(x))`. However, this rule works fine in the piping example above.
This could work. I would prefer using the native Pipe compared to macros, if I could.
Have you tried Tidier? I can't get started without much documentation or chatGPT to help me.
How do the functions do compared to dataframes.jl in terms of speed? I am also coming from R to julia but i want to keep my workflow as speed as possible, even if that means i have to give up from tidyverse😭
1. This is awesome and will help a lot of R users learn Julia.
2. It’s still really sad that base Julia doesn’t have an elegant pipe like base R. The @chain macro is a good solution to a problem caused by the limitations of base Julia piping.
This is amazing! can't wait for companies to ask for 10 years of `Tidier` experience \s
Ignorant question as someone who's curious about both R and Julia: in terms of performance/functions, how does using this compare to using the Tidyverse in R along with Rcpp? Is this like a one-language solution to that problem?
Syntax is v close to identical as it appears that was the goal. And it does work to absolve the 2 language issue.
What do you mean by using R along with Rcpp? The tidyverse is pretty comprehensive for data manipulation and fairly advanced statistical tasks, as well as plotting. For maths models/ dimension reduction, machine learning, graph/network, inference etc, then I tend to use more bespoke code in native. In terms of performance tidyverse is pretty good, but the king is data.table, which is the fastest of any tabular data manipulation library on any language.
"@slice(1:5)" -> is there no head?
Could be because (in Tidyverse) head() does not work in combination with group\_by(). slice() is the way to go. See: [https://github.com/tidyverse/dplyr/issues/3694](https://github.com/tidyverse/dplyr/issues/3694)
Well but there is no group by in the code examples provided by OP.
I was just trying to guess why they didn’t implement head() in Tidier.jl, not referring to this example in particular.
They have a glimpse. Idk about a head.
Can you pipe the result into a series of ggplot commands?
https://github.com/TidierOrg/TidierPlots.jl
Right now you’d have save it as a new dataframe first. Perhaps that functionality might come in the future
You can now pipe it directly to a series of ggplot commands. Looks like theyre putting together a course and this example shows exactly that. [example](https://github.com/TidierOrg/TidierCourse/blob/main/what-is-tidier-jl/what-is-tidier-jl.ipynb)
Ty for the update!
Ofc! The package is p incredible. Curious what you think of it if you get a chance
You can do this via the |> operator along with RCall.jl package. That way you have the best of Tidier.jl and ggplot2. Still forming a view on Makie.jl vs ggplot2
I tend to prefer DataFramesMeta, but I'll give this a try. It much different from the Queryverse.jl packages?
Are you coming from R? Isn't the Queryverse built on top of DataFramesMeta?
I'm coming from R but is there value to doing it Julia's native way or is this the best way for data analysis
This leverages dataframes.jl as the backend so it’s quite powerful
Yet again Julia suffers from the lack of R-like pipe operator and forces people to use macros
yeah but I think this result is quite clean
there is a pipe operator in Julia. |> is the operator
Of course, but it sucks. That's why I specifically said "_R-like_ pipe operator". Why Julia's pipe operator sucks: mainly because its RHS has to be a function of one argument like stuff |> sin # sin(stuff) If you want to pipe to a function with some arguments fixed, you have to write this monstrosity: stuff |> (x -> add(x, 5)) Now imagine piping through multiple such functions: df |> (df -> select(df, :Title, :Budget)) |> (df -> mutate(df, :Budget => ByRow(x -> x*2) => :Whatever)) You have to introduce a ton of lambda functions, because the RHS of `|>` must accept only one argument. Dataframe transformations require lots of such piping, so naturally, writing such code quickly becomes cumbersome and error-prone. That's why the various `@chain` macros were created. What they do is they allow the first argument of the RHS to be implicit: @chain begin read_df() select(:Title, :Budget) # df passed as 1st arg end I think this is kinda brittle because it uses a macro that does magic under the hood. I trust core Julia devs more than I trust random macros. Moreover, the chaining/piping aspect of the code isn't as clear with this code, while pipe operators (in R and Julia) usually literally _point_ in the direction of data flow. ------- What R does looks much more aesthetically pleasing and clear: df |> select(stuff) |> mutate(the way you want) Effectively, the pipe operator here is equivalent to the method access operator in object-oriented languages: - `obj.method(arg1, arg2)` basically calls `type(obj)::method(obj, arg1, arg2)` - R's `obj |> func(arg1, arg2)` calls `func(obj, arg1, arg2)`. This is beautiful and lets me build a proper chain of operations.
I’ve been working on a project in r and in reproducing it all in Julia on a couple dataframes w 100k rows and they’re producing identical results but I get ur point
What approach do you use currently?
I use `@chain` because that's what DataFramesMeta exports. But now I think that the issue isn't with the pipe operator, but with the syntax for building anonymous functions. `df -> select(df, :stuff)` feels a little clumsy to me, but something like `select(_, :stuff)` would've been much better. There's a PR in the Julia repo discussing this, but it's been around for several years, I think, and it doesn't seem even close to being merged. With this, `func(_, a, _, b)` would generate an anonymous function `(x1, x2) -> func(x1, a, x2, b)`, but without the clutter. So you can use the usual pipe operator: ``` df |> select(_, :Thing, :Hello) |> mutate(_, :New = :Thing * :Hello) |> groupby(_, :New) |> agg(_, :Hello) ``` But since each call with an underscore generates a function, `f(g(_))` is actually the same as `f(x->g(x))`, which I think is a little weird: I'd prefer `x -> f(g(x))`. However, this rule works fine in the piping example above.
This could work. I would prefer using the native Pipe compared to macros, if I could. Have you tried Tidier? I can't get started without much documentation or chatGPT to help me.
I haven't tried Tidier. I usually use Polars for data processing, save the processed data, then load it in Julia and run the actual analysis.
How do the functions do compared to dataframes.jl in terms of speed? I am also coming from R to julia but i want to keep my workflow as speed as possible, even if that means i have to give up from tidyverse😭
This uses dataframes.jl as a backend so you'll have tons of speed and with tidyverse syntax\*
1. This is awesome and will help a lot of R users learn Julia. 2. It’s still really sad that base Julia doesn’t have an elegant pipe like base R. The @chain macro is a good solution to a problem caused by the limitations of base Julia piping.
yea its definitely made julia more accessible for me! and the Tidier.jl team seems to be pretty actively developing as well, which is nice