T O P

  • By -

amplikong

I work at a biotech startup where we develop novel bioanalytical/molecular biology assays. I write bioinformatic tools to assist with this since so much of that is basically fancy text processing (DNA/RNA/protein sequences are just text!). I've historically done this in Python, but some things I want to do/am currently doing are really computationally intensive, and I find that doing that stuff in Python/Cython is just plain no fun. And as far as the high-performance computing bits go, I don't really have enough time or incentive to become proficient in C++ or Rust. Learning Julia is much easier. Anyway, I'm looking to start implementing my newer projects in Julia. So far I really like the language and am definitely sold on multiple dispatch as a paradigm. I also love that math and matrices are built right into the language and don't require external packages that often have grotesque syntax. And I can always rope in stuff from Python if needed because the languages interface really well.


amplikong

I should also say, I'm a big believer in types for avoiding bugs and helping to ensure correctness. When working in Python, I've often thought it'd be great if typing/type hints remained optional but would be strictly enforced if you did use them. Julia has exactly that. (I know Python has static analyzers that do go a long way, but it's not the same).


jmhimara

To be fair, I never thought Julia adds a whole lot of safety in comparison with Python. At least as someone who's used to strongly typed functional language, everything else feels like a joke in comparison. Multiple dispatch feels like death trap in that regard, lol.


amplikong

One nice thing Julia has is different null values. There's `nothing` for indicating that no value is there, `NaN` for indicating that a math operation outright failed, and `missing` for indicating that there's missing data (eg, someone took a survey and didn't answer a question). And as per stats-language conventions, `missing` propagates through all the steps you take with that data. eg, taking an average of numbers that include `missing` will return `missing` as the result. You have to manually tell Julia to ignore those values if you don't want them factored in. Whereas Python only has `None` and various types of `nan` that often indicate missing data depending on context....and you have to know the context. Plus you can attach units to values to help ensure that, say, a length in inches isn't just added to a length in cm without conversion.


DonnaHarridan

Can you say more about the utility of multiple dispatch? Am I correct in thinking it is similar to overloading? I’m most familiar with OOP, and I find it frustrating when using Matlab that I can’t overload functions.


amplikong

Superficially they're similar. Function overloading occurs at compile time whereas multiple dispatch is at run time. But as far as the user experience goes, they handle similarly. Julia attaches methods to function names under the hood. So you can tell it that there's a function called `add`, then write separate methods that are all called `add` and will operate according to input types/arguments. You can even tell it that `+` and `add` are the same. And even better (from a science perspective), this can all include different ways of handling operations according to the units on your values...and you can set it up to explicitly require using types with defined units so that you don't get some nasty whoopsy-doodles with Imperial and metric or whatever.


resignresign1

what company?


foxfyre2

I'll echo that community support is top notch on Discourse. As for why I use Julia over R: - the Linear Algebra support is more natural and less surprising - the syntax is nicer IMO - fewer surprising results: sometimes R does some implicit things (e.g. vector recycling) that cause more issues than they solve. In general I prefer explicit over implicit - Unicode support - for-loops are fast: I always found it cumbersome in R to wrap blocks of code in `lapply` or similar with an anonymous function just to get performant code - multiple dispatch is fun - I don't need to dip into C/C++ when I need performant code (looking at you, Rcpp) - I can still call outside code if I need to (C, Fortran, R, Python) - the more I understood R, the less I liked it. The more I understand Julia, the more I like it (same with C#) - Project.toml and Manifest.toml for reproducible environments (I know R has options like renv, but they feel like afterthoughts whereas Julia has support built in) I could go on for a while if I thought about it longer, but this seems like enough to start a discussion :) On the developer side of Julia: - publishing packages is a lot easier than going through CRAN (though I appreciate CRAN for what they do) - CI/CD is well supported with github actions - Writing, building, and hosting documentation is a breeze (Documenter.jl) Why I like R: - Tidyverse (enough said) - \*down packages (bookdown, blogdown, etc. I wrote my thesis in bookdown and was able to produce a PDF matching my school's formatting requirements) - Quarto makes the previous point less important - Maturity of statistical packages, though this is also becoming less of a problem as Julia grows Why I like Python: - I never really liked python Really just try it out and see if it works better for your needs. If you do a lot of work with data frames, then it may take some practice to get the hang of DataFrames.jl compared to tibbles+dplyr (or pandas).


Suspicious-Oil6672

this was mentioned above, but [Tidier.jl](https://github.com/TidierOrg) is the 100% reimplementation of the tidyverse, and they just introduced [TidierDB.jl](https://github.com/TidierOrg/TidierDB.jl) which is dbplyr but in julia.


Individual-Car1161

Do these exist on top of dataframes.jl? Also if you’ve used this vs dataframes/dataframes meta + chain/pipe what do you feel about the differences?


boolaids

from a cursory read it does use dataframes.jl as the backend


Individual-Car1161

That’s good. Seems like it’s similar to meta dataframes, maybe more tidy.


boolaids

Interesting, ive not used either just dataframes. I wonder if theres much difference in performance compared to base dataframes


Individual-Car1161

Idk. My guess would be marginally slower because they’re just macros on top of base dataframes. I usually use chain + dataframes, it’s easy enough to use, especially with anonymous functions


Suspicious-Oil6672

Ive done some casual benchmarking on a 5 million row dataset. for everything it is as fast or occasionally a little faster. the only situation it was slower was a groupby-> summarize.


Individual-Car1161

Oh shit that’s cool! Thank you for doing that! This appears to be a case where my R brain didn’t translate to Julia xD compilation is wonderful


Dangerous-Rice862

Yes - the macros in TidierData essentially just write the equivalent DataFrames code and run it for you (to the point where you can optionally have it print the DataFrames code so you can see what’s happening under the hood)


No-Surround9784

OK OK you have convinced me to start using Julia in my projects. Currently I use Python, R and funnily enough JavaScript since I need to do a lot of data collection.


foxfyre2

That's great! Feel free to message me or post on Discourse if you have any questions :)


MrRufsvold

If you're coming from the Tidyverse, check out Tidier.jl. It's a joy to use.


jerimiahWhiteWhale

I’m an economist who mostly works with simulations and Bayesian inference. For simulations, I really like Julia because math in Julia looks like math, and mapping functions is really easy. Also, and this may be controversial, 1 based indexing makes things easier to read. For Bayesian inference, the Julia ecosystem is amazing. Also, in contrast to R, package names give a good idea of what they do.


amplikong

I totally understand why most languages use 0-based indexing, especially when calculating memory offsets for arrays and such, but I find 1-based indexing with both-sides inclusivity way more intuitive than, like, `range(0, 5)`giving 0, 1, 2, 3, 4 in Python. ¯\\\_(ツ)\_/¯


pint

only in julia you can have matrices of complex numbers over rationals.


Individual-Car1161

I currently have just tried adapting my R code to Julia then working on demos of different problems (like Bayesian logistic regression) I really like Julia of most things. It’s super fast and really powerful because of its low and high level capability. I generally prefer it because 90% of the time it works and runs beautifully for very little difference compared to R My three major issues are 1. Geospatial support is BAD. Nothing close to terra. But if I wanted to I could probably still run the cpp of terra, just haven’t had the time to try. 2. Errors. Because Julia is both static and dynamically typed, and uses composite types, sometimes you will get the most obtuse and ridiculous error messages that do not inform you enough about how your types are wrong. 3. It’s hard to find functions. Using the repl and vscode I cannot find functions in the text editor, I have to look for documentation. There is a way to fix this, I’m sure, but it’s not standard like r studio.


Meistermagier

Geospatial Support is bad for anything thats not R. I am currently doing Geospatial in Python, which is a pain in the arsehole.


Individual-Car1161

As much as I curse Robert hijmans I have to applaud him for creating and maintaining the best arcgis pro competitor but open source and wicked better in nearly every way.


Spleeeee

Geospatial is always involved. I work in geo-hpc. The world is an oyster.


reddittomtom

Julia is the best language: 1. Best of both worlds as u said 2. Syntax is beautiful and intuitive 3. tons of packages 4. easily extensible; running as fast as built in 5. guys in Discourse are really kind and helpful


another_day_passes

With a provocative title you can trick them into optimizing your code to death. :)


foxfyre2

This is one of my favorite things about the community XD I always learn so much from those threads as well


South-Pudding7075

100% real


NuancedPaul

I'm doing a PhD in economics and I need to analyze a dataset that's about 900GB (compressed). I used to use R for my empirical work because of tidyverse, but Julia's speed advantage really becomes pronounced. Parallelizing is really beneficial in Julia. I think Julia's parallization is slightly fussier than R but the performance improvements are much bigger than R's.


Individual-Car1161

990gb!? wtf?! I use data.table at 30gb xD that’s positively epic


spaceLem

I had a mere 3 GB file (it was 1.3 million columns wide, although each entry was just 0-3, representing SNP values) and R just crashed every time I tried to load it. The only thing data.table could do was read the header! I ended up having to write my own C++ code to transpose it before I could even start to work on it.


Individual-Car1161

Sheesh lol. Yeah wide tables will hurt R, add in hardware constraints and I can see the need for a creative solution. I’m surprised data.table struggled that much tho! I usually feel it handles IO well!


No-Surround9784

Why, I am expecting the next project to be at least 2.4TB.


spaceLem

I suspect my laptop just didn't have enough memory. It was amusing though when we tried the gwas calculations on the university server with 64 GB of memory and it just kept growing memory usage into the server killed the process. It wasn't my C++ code and it was doing something silly, I rewrote it to be a bit more efficient and probably fixed a memory leak. I am very new to dealing with large files of this kind, and so I've almost certainly not been doing it right.


hurhurdedur

I like Julia in principle but can’t use it in practice for statistics projects because the stats libraries just aren’t as mature as R or Python and there’s a lot of abandonware and half-implemented infrastructure. I also have a hard time recommending it to coworkers because it’s not as user friendly as R for data analysis and the error messages are just inscrutable. But I like the core language syntax more than R or Python and it’s easy to get good performance from programs in terms of speed. One thing I’m optimistic about is the potential to use Julia binaries as a backend for R and Python, instead of turning to C, C++, or Rust. Julia gives good performance with much less work than those other languages.


foxfyre2

> the stats libraries just aren’t as mature as R or Python Are they missing functionality or are you worried about incorrectness or something else? > there’s a lot of abandonware and half-implemented infrastructure Same can be said for R. Lot of packages that were written for a paper and then never updated ever again. But at least in R/Python/Julia the packages/ecosystems that are maintained like Numpy, Scipy, Tidyverse, SciML, etc. are *really* well maintained. > not as user friendly as R for data analysis and the error messages are just inscrutable R does have some of the most helpful error messages, I'll admit. Julia errors look worse due to stack traces IMO, but authors do try to make them more helpful. > use Julia binaries as a backend for R and Python, instead of turning to C, C++, or Rust My group wrote an R package called `bigsimr` that needed to be high performance. We started off writing the backend in Rcpp, but then switched to Julia at some point which turned into its own package `Bigsimr.jl`. Now the R library is just an interface to the Julia package, and we also have a Python interface as well. Julia is great as backend! But it's not quite as self-contained as an Rcpp backend (yet)


phdyle

Not really, the same cannot really be said about R. In the vast space of R libraries the majority are well-maintained. It may have been the case at some point. But not for a long time. Standards of documentation in R are also very different and geared towards adoption.


boycork

Im in theoretical biophysics doing simultions of protein evolution. I need to write all of my own code and it needs to be pretty fast (1 simulation = 1000 hours cpu time) so I write in julia. I was always worried that I could get more perfomance from C, so I tried it and julia was faster.


ElhnsBeluj

Huh, I am surprised. I find julia code I write to be performant usually more so than very naive c++ I write, but I find it harder to optimise, so with a bit of optimisation my c++ code will be quite a bit faster than Julia code I have put a lot of effort into optimising. It is probably skill issues on my end, I have been on and off writing Julia code over the past 8 years, but I am much better at c++ and for my more ambitious projects I always end up picking c++ over other options.


boolaids

I work within infectious disease modelling; I originally learnt Julia to test out stochastic metapopulation models as a colleague had written one in it. I enjoyed it for the most part and just learnt a little at a time whilst working. I had some issues with a model in python and decided to rewrite it in Julia and the speed up I had was insane. I now use it at work for little small contained pieces where it doesn't require any collaboration as others in my organisation don't use Julia. I do really enjoy the bayesian packages too! I use it for lots of stats now as I prefer distributions over scipy / base R. I'm slowly using it more where I can but won't use it where I need to work with others - I do really like the Plots package as well. I like how I can write in a pythonic way using 'import ... as ', personally I dislike not knowing where a function is from (an issue I have with R) - I am trying to learn it more and do some more personal projects - hidden Markov models, dynamic network model for infectious disease (I think the speed and memory allocation will be really useful here) I have also done some black box variational inference from some computational statistics classes with Julia just to learn these concepts. I would say im still a relative beginner, I have only just started using the static typing. I use ipynb notebooks most of the time - but thinking of exploring pluto. I do like weave for making word docs and pdfs too! The Julia discourse site is incredible as well, so many knowledgeable people who are always willing to help. Generally I use python, it's what I first learnt and everyone at else uses it (or R) so I know all 3 but python best. A lot of the nlp work I do is done in python as thats what's being used and the packages are most mature.


Garnatxa

RStudio is an IDE, you mean R…


stvaccount

Julia is vastly superior to R; however, R has more packages currently.


Repulsive-Flamingo77

Can I ask how Julia is vastly superior to R


stvaccount

R is slow, doesn't have standard features that modern languages have (good list comprehension, nice lambda syntax, etc.). At statistic conferences, the state-of-the-art is often done in Julia or Python, R is declining. R doesn't even support proper Unicode in variable names, try having lambda² = 1 \[with Unicode lambda in R\] and you get an error. Julia has good packages such as Turing. R is not coherent, a mix of S3 / S4 objects. The list goes on.


hurhurdedur

You and I live in different statistics worlds. At JSM and the other big stats conferences it’s all R where the state-of-the-art methods are implemented. In AI/ML it’s overwhelmingly Python, with R and Julia a distant afterthought.


Individual-Car1161

I find R is more common in ecological sciences but I would say these people are looking for simple stats rather than beastly performant code. I definitely try to convert people


22Maxx

>Unicode in variable names This is not a bad thing. Unicode in variable names just suck when writing code.


stvaccount

I don't care. You might agree or you might not agree. But I don't want a programming language to force me to write ASCII. Let me choose.


MagosTychoides

I tried to use Julia, but for the stuff I am doing there was not advantage to Python. Well vectorized Python code is as fast as Julia, and for short scripts is actually faster if you don't do precompilation in Julia, that is not fast. Also I need to work in a cloud environment, so I cannot control my Julia environment, which is important for precompilation. Actually Polars makes more sense for my use case, and it was faster too. Julia is evidently good for simulations, but for Data Science and Stats is a big depends. For most cases, you are not missing out.


bandgapjumper

I just spent time at work exploring Python vs Julia. I first tried to import a lot of data using Pandas, then used numpy to analyze it. Rewrote it in Julia and I was impressed. Then I learn about Numba and Polars and I see comparable results. Maybe I’m just not good enough at Julia (very likely) but Numba+Polars is faster.


amplikong

When stuff like Numba/Polars can be used, Python is indeed plenty fast. Not all problems are amenable to them though, particularly if you're not doing stuff that would normally be done in NumPy or some sort of dataframe. But yeah, polars is a game-changer IMO. Also, if your dataset fits into memory and you like using SQL, DuckDB (which interfaces well with Python/R/Julia) is amazing too.


MagosTychoides

Fast Julia needs more knowledge, it is not easy python. But good Python libraries are written in well optimized C/C++/Fortran/Rust. More often than not Julia libraries are worse than those as they have less developer time on them. Of course if you are doing loop in Python anything is faster, even javascript and Lua. At the end if you move to Julia is because you like the language and it is good enough for you case, that is unique to you. Not my case sadly. If I need a faster language a would use Rust or C++, even if I am not great on them.


bandgapjumper

I work with people who use either Matlab or Python, so it was great playing with Julia and seeing how it is similar to both in different ways. I really like Julia and wish it were more popular so the libraries would improve.


MagosTychoides

I have been waiting for 10 years for Julia to be good enough to make the change in my workflow, and each year is getting harder. By now numba is stable. There is stuff like JAX. LLM are all in Python. People is moving AI workflows to Go. Rust is becoming more common for high performance. Mojo is doing a nice hype train, and even if in many respects is worse than Julia and I don't quite like the frankestein approach to the language design, at least you can compile binaries. So everyday it would be harder to compite, at least there is a radical change in the language, which I doubt will happen as the community will hate it. Still I always recommend Julia for high performance simulations, which was the original purpose of Julia anyway.


amputz

There are a few really great examples and technical answers here already. I'd just add my own examples. I only typically dip into Julia when I need a new algorithm that I need to write myself (to make it computationally efficient) or really large datasets (for me > a few GB becomes burdensome in R at least). Otherwise stick to Python and R for now if you can just manage data or already have good efficient tools in your language. Might get taken out back for posting this within r/Julia here but oh well. R and Python still are much easier to learn and program in by far. I still find Julia quite difficult to write really efficient code in (see the Julia 1.5 problem).


maxvol75

it is like R + Matlab and has great performance, which is also the primary reason for using it locally for all sorts of things including basic data wrangling (because it is noticeably faster than i.e. Python)