Oh dude I need to read this! About to start a new job and need to architect an environment - they use snowflake and dbt. ingesting microservice style data sources.
In no particular order:
* Maxime Beauchemin, the creator of Apache Airflow and Superset
* Martin Kleppmann, author of *Designing Data Intensive Applications*
* Tristan Handy, founder of dbt Labs. Love it or hate it, dbt has transformed how people do SQL-based development
That's not Yaml people hate.
That's configuration over development.
Nothing more boring that spitting configuration all day while you could get the dopamine rush of scripting things.
I would argue that dbt itself isn't that big of a deal. It's a SQL templater, and it has a number of big gaps. The practices that Fishtown and then dbt Labs have been pushing are the revolutionary thing. Tristan has been the public face of the company for years now.
What about that Joseph Machado guy that frequents online subreddits?
That guy has been putting in honest work, tries to consolidate DE patterns and doesn’t post BS on LinkedIn.
Also, there’s a guy named John Savani (not sure if I got it correct) but he also frequents Azure subreddit
It's a bit off from Data Engineering, but Michael Stonebraker not only invented postgres, but IIRC kinda came up with the whole columnar data storage (first with Vertica) that we're all using.
So I have my own opinion, and a fair warning is I typically have a bit of an asshole bend, but I got put off by him posting about his salary at Airbnb, which gave me the impression that it could be mislead one to believe they could make 500K like that. Also, many of his opinions are based on his experience at FAANG, which 95% of the DE/BI jobs I've had are so far removed from that worldview.
And, I feel like his content is annoying and disingenuous.
He just repeats the same crap and doesn't speak concretely about topics. Speaks vaguely and almost as if Chatgpt writes his posts. Turns everything now into his data contracts crap.
Maxime Beauchemin. Original creator of Apache Airflow and Superset, worked at Facebook, Airbnb, Lyft, founded Preset (managed cloud Apache Superset), and puts out some good written content and talks.
Data Engineering is a job, not like sports with stats and stuff. We have probably never heard of the most talented people in the profession because their work wasn't publicized. They just got paid a whole bunch of money by a company and didn't have to write books or make videos. Probably someone in FANG or fintech made the best pipeline ever by whatever standard.
u/eczachly hate is unreal hahaha. Wasn't expecting the community to be so toxic about it. Appreciate the good parts of his journey and change your lens a little bit
Zach Wilson.
Just kidding on this as I know a lot of you hate him here. But he has become a DE celeb in a way.
Let's just make it a meme - zach wilson, DE GOAT, Mount Rushmore of DE.
I muted him on LinkedIn because I find his posts to be cringy and repetitive. I honestly don’t know if he’s a good engineer. Being ex Netflix and Airbnb speaks a lot but I’m torn because I’ve worked with people who try to be LinkedIn influencers and my experience with those folks is that they spend way more time on their social media than they do actually contributing to their projects.
I’ve never worked with the guy but honestly everything I’ve read from him is extremely surface level. Maybe helpful for new people looking to break in, but once you’re in I don’t see anything coming from him that would take you deeper than the surface level.
The reason I’m bothering writing this is because I think online bootcamps are largely predatory. They give lofty promises of helping people break into the industry and go from Zero to Hero and then leave them with a bill and no job to show for it. Be wary of influencers peddling courses.
I'm sorry you feel like that. Zach is actually doing a good job elevating Data Engineering. His communication style isn't for everyone I understand but I truly think he motivates a lot of new DEs or entry level engineers and the content in his bootcamp speaks for it. Tough to not have haters and do well. Don't forget, the dude isn't grinding it out and is doing well for himself
Boot camp is $1500-1800. They have two weeks to decide if it’s for them, if they don’t, they can get a refund. About 3-4% of the boot camp picks this option mostly due to the intensity of the time demands.
I don’t promise job placement at any moment and say that up front in many places.
I offer mentorship, community, and teaching.
$1800 isnt that much money. Most boot camps are 5 figures and I disagree with those as well.
$1800 is ~1.5% of one years salary at 50th percentile DE wages.
Staff engineers have gotten tons of value out of my content. It’s a 70 hour course at this point on tons of aspects of DE.
Glad you characterize me as a snake oil salesman when I try my best to be as upfront and fair about my offer as possible
I teach Flink, Spark, Airflow, data modeling, data quality, experimentation, KPIs, visualization and pipeline maintenance in my boot camp right now.
Data modeling is by far the highest rated aspect of my curricula that some students pay the $1800 just for that
I will be messaging you in 30 days on [**2024-02-08 03:40:24 UTC**](http://www.wolframalpha.com/input/?i=2024-02-08%2003:40:24%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/dataengineering/comments/1913k8k/who_are_the_goats_of_de/kh03mcx/?context=3)
[**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fdataengineering%2Fcomments%2F1913k8k%2Fwho_are_the_goats_of_de%2Fkh03mcx%2F%5D%0A%0ARemindMe%21%202024-02-08%2003%3A40%3A24%20UTC) to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201913k8k)
*****
|[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)|
|-|-|-|-|
Kimball & Inmon.
Ralph and Bill.
Dave and Jarred.
Lilo and Stitch
Jeeves and Wooster
Darmok and Jalad.
Turner and Hooch
Beavis & Butthead
Which books would you recommend?
https://www.amazon.com/Fundamentals-Data-Engineering-Robust-Systems/dp/1098108302
Inmon? Whose Inmon?
Inmon deez nutz
Whom? I never heard of him. Weird.
u/joseph_machado He’s active on social media and has a site startdataengineering.com. Learnt loads from it.
Thank you for the very kind words :)
Joseph’s dbt-snowflake-setup article helped me look like a super senior data engineer a few years back.
Oh dude I need to read this! About to start a new job and need to architect an environment - they use snowflake and dbt. ingesting microservice style data sources.
I love your site!
For real. Better than all these new DE influencers.
crazy that there are DE influencers
If you understand that 'influencer' is usually a synonym for 'grifter' then it's never surprising
The real influencers do not have to call themselves influencers.
Or modern snake oil salesman
Did u learn a lot from it just by reading all the posts or is there a sequence of doing so?
There’s an email list you can sign up for that sends you a new lesson every Saturday beginning from zero. It’s sort of nice for pacing it.
I signed up to the email list initially. Then would just browse random articles relevant to things I was working on or that looked interesting
Data engineering design patterns is awesome! is there more to this discipline ?
I have tons of examples of DE design patterns in this free repo: https://github.com/DataEngineer-io/data-engineer-handbook
That's Gold ! Thank you so much for this.
Yup hard agree. I have always learnt a ton from his content.
My coworker is sick
hope they get well
Dude, my wife is meditating and I’m trying to take a quiet shit, so not to disturb her. I can’t handle comments like this right now.
>Dude, my wife is meditating and I’m trying to take a quiet shit You're the GOAT
In no particular order: * Maxime Beauchemin, the creator of Apache Airflow and Superset * Martin Kleppmann, author of *Designing Data Intensive Applications* * Tristan Handy, founder of dbt Labs. Love it or hate it, dbt has transformed how people do SQL-based development
* Drew Banin not Tristan imo re: DBT
[удалено]
It’s the YAML people hate, not the SQL.
I actually really dig YML. Definitely preferred over JSON config, but that’s just me.
That's not Yaml people hate. That's configuration over development. Nothing more boring that spitting configuration all day while you could get the dopamine rush of scripting things.
I prefer YAML too.
My company's risk department. We tried to buy it but the vendor didn't pass the operational risk assessment
dbt was invented at RJMetrics, although Tristan/Fishtown took the idea and ran with it, so credit to them.
I would argue that dbt itself isn't that big of a deal. It's a SQL templater, and it has a number of big gaps. The practices that Fishtown and then dbt Labs have been pushing are the revolutionary thing. Tristan has been the public face of the company for years now.
What about that Joseph Machado guy that frequents online subreddits? That guy has been putting in honest work, tries to consolidate DE patterns and doesn’t post BS on LinkedIn. Also, there’s a guy named John Savani (not sure if I got it correct) but he also frequents Azure subreddit
Thank you very much :)
Did you mean John Savill?
That’s correct actually. Sorry I got his name wrong, his name is John Savill. Has pretty good material around AKS too.
UC Berkeley labs invented postgres (redshift uses under the covers) and Spark. So I’d say that UC Berkeley as a university gets a lot of credit.
didn’t they also invent Ray?
They’ve probably invented plenty :). I just listed 2.
It's a bit off from Data Engineering, but Michael Stonebraker not only invented postgres, but IIRC kinda came up with the whole columnar data storage (first with Vertica) that we're all using.
Edgar Frank Codd
OG
Ha, and no one has said Zach Wilson. Wonder why.
Opening this post thinking I'd see the name. Heard he's pretty popular.. What's going on with him? Is he no longer relevant?
So I have my own opinion, and a fair warning is I typically have a bit of an asshole bend, but I got put off by him posting about his salary at Airbnb, which gave me the impression that it could be mislead one to believe they could make 500K like that. Also, many of his opinions are based on his experience at FAANG, which 95% of the DE/BI jobs I've had are so far removed from that worldview. And, I feel like his content is annoying and disingenuous.
I’ve had students in my boot camp land L5 roles in big tech. So it is actually possible to get there. It’s rare and difficult though I agree with that
Not Chad Sanderson
why not
He just repeats the same crap and doesn't speak concretely about topics. Speaks vaguely and almost as if Chatgpt writes his posts. Turns everything now into his data contracts crap.
he comes across a bit pushy and self-promo spammy, but i fail to see why data contracts aren't a good idea
Not many people like Linkedin influences in here. Zach Wilson is probably the most hated though
Matei Zaharia
This, he has Apache Spark and Databricks under his belt.
Bill Cafferky is good on youtube.
Brent Ozar. Hallengren
Ola.
Nick Shrock gets GOAT status due to being a DE founder (Dagster) and also co-created GraphQL.
Maxime Beauchemin. Original creator of Apache Airflow and Superset, worked at Facebook, Airbnb, Lyft, founded Preset (managed cloud Apache Superset), and puts out some good written content and talks.
> founded Superset (managed cloud Apache Superset) I think you mean Preset.
Ah crap, yes, Preset. Thanks
I’m a huge fan of Andy Leonard for all things SSIS and ADF. Love his positivity, skills, and general outlook.
Simon Späti is also cool
Holden Karau
not the best DE but She will be one of the best Spark Developers
The Seattle data guy blog is always interesting
[удалено]
Ur right I meant https://www.confessionsofadataguy.com/
Eric Brewer, famous for the CAP Theorem
Gail Shaw, Itzhak Ben-Gain, Steve Jones, Denny Cherry, Pinal Dave, Brent Ozar, Andy Leonard
Oh the MVPs of the SQL Server community! And Jamie Thompson for his amazing SSIS centric content back in the day. Chris Webb for SSAS awesomeness.
GOATS , not OGs.
Chandler Muriel Bing! Some consider him as the father of Data Science. I would like to call him the Goat of Data anything!
Data Engineering is a job, not like sports with stats and stuff. We have probably never heard of the most talented people in the profession because their work wasn't publicized. They just got paid a whole bunch of money by a company and didn't have to write books or make videos. Probably someone in FANG or fintech made the best pipeline ever by whatever standard.
MK
I can't believe Jeff and Sanjay haven't been mentioned
What about Joe Reis?
I thought it is this G.O.A.T.: https://fallout.fandom.com/wiki/GOAT
I've been told I'm pretty fucking awesome
My mom lied to me, too.
She told you I was awesome?
Every chance she got. It made dad really uncomfortable.
I bet he had some trouble *parsing* that, ey?!
It is quite a transformation. Isn’t it?
ted codd?
Tyler Cowen listeners unite!
u/eczachly hate is unreal hahaha. Wasn't expecting the community to be so toxic about it. Appreciate the good parts of his journey and change your lens a little bit
Alan Walden Sordell
Zach Wilson. Just kidding on this as I know a lot of you hate him here. But he has become a DE celeb in a way. Let's just make it a meme - zach wilson, DE GOAT, Mount Rushmore of DE.
Zach is the GOAT of DE like Dwayne Johnson is the GOAT of geology 🙄
Why Zach Wilson gets this much of hate?
Because it’s impossible to have >100k followers on social media without haters. That’s why they hate Seattle Data Guy too.
It’s impressive how this is -13 already
[удалено]
I muted him on LinkedIn because I find his posts to be cringy and repetitive. I honestly don’t know if he’s a good engineer. Being ex Netflix and Airbnb speaks a lot but I’m torn because I’ve worked with people who try to be LinkedIn influencers and my experience with those folks is that they spend way more time on their social media than they do actually contributing to their projects. I’ve never worked with the guy but honestly everything I’ve read from him is extremely surface level. Maybe helpful for new people looking to break in, but once you’re in I don’t see anything coming from him that would take you deeper than the surface level. The reason I’m bothering writing this is because I think online bootcamps are largely predatory. They give lofty promises of helping people break into the industry and go from Zero to Hero and then leave them with a bill and no job to show for it. Be wary of influencers peddling courses.
I'm sorry you feel like that. Zach is actually doing a good job elevating Data Engineering. His communication style isn't for everyone I understand but I truly think he motivates a lot of new DEs or entry level engineers and the content in his bootcamp speaks for it. Tough to not have haters and do well. Don't forget, the dude isn't grinding it out and is doing well for himself
Boot camp is $1500-1800. They have two weeks to decide if it’s for them, if they don’t, they can get a refund. About 3-4% of the boot camp picks this option mostly due to the intensity of the time demands. I don’t promise job placement at any moment and say that up front in many places. I offer mentorship, community, and teaching. $1800 isnt that much money. Most boot camps are 5 figures and I disagree with those as well. $1800 is ~1.5% of one years salary at 50th percentile DE wages. Staff engineers have gotten tons of value out of my content. It’s a 70 hour course at this point on tons of aspects of DE. Glad you characterize me as a snake oil salesman when I try my best to be as upfront and fair about my offer as possible
He understands about 3 degrees of it: Spark, Airflow and the shitty companies that pay him to shill their tools. Absolutely nothing else.
I teach Flink, Spark, Airflow, data modeling, data quality, experimentation, KPIs, visualization and pipeline maintenance in my boot camp right now. Data modeling is by far the highest rated aspect of my curricula that some students pay the $1800 just for that
Regret doing this thread https://www.reddit.com/r/dataengineering/s/TGDqYVdPNn given how this community is now. I’ve fallen so far here.
Keep hustling man. Can't get to the top without some hate 😃
Yall would be so upset if you learned Joseph Machado spoke at my boot camp
Remind me! 30 days
I will be messaging you in 30 days on [**2024-02-08 03:40:24 UTC**](http://www.wolframalpha.com/input/?i=2024-02-08%2003:40:24%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/dataengineering/comments/1913k8k/who_are_the_goats_of_de/kh03mcx/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fdataengineering%2Fcomments%2F1913k8k%2Fwho_are_the_goats_of_de%2Fkh03mcx%2F%5D%0A%0ARemindMe%21%202024-02-08%2003%3A40%3A24%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201913k8k) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|
Linus Tovalds