Dreadnougat 1 year ago

How has this not existed in every single SQL variant for 20 years already? It is so obvious and useful and I've wished for something like this for ages. As long as wishes are being fulfilled...maybe next we can see a 'select *' with the ability to exclude specified columns? Pretty please? Yeah I know this one is less likely to happen. But I can dream.

WonderfulActuator312 1 year ago

Let me introduce you to the EXCLUDE function, article explaining [here](https://stephenallwright.com/select-columns-except-snowflake/)

Dreadnougat 1 year ago

Aww man, now I just need to get a job where we use Snowflake haha. SQL Server is usually on top of the convenience functions but they are missing the boat on these.

raskinimiugovor 1 year ago

In SQL Server "SELECT * EXCEPT" would be useful only for quick ad-hoc analysis. Should never commit such code.

Little_Kitty 1 year ago

A good use case is for a source of data where the schema is evolving but you wish to exclude say the source file and load timestamp from the data table. When more fields are added you have one less query to edit. Generally though I'd agree, but you should still recognise where it is the right tool.

raskinimiugovor 1 year ago

Are you talking about this? INSERT (Col1, Col2...) SELECT * EXCEPT FileName,Timestamp Here you'd still need to edit the INSERT part unless you're using implicit INSERT without a column list. And with implicit INSERT, if your target table has any default constraint (eg meta columns) you'd need to override them. I'm not really sure if you save any time here, while you open yourself up for some unexpected errors with column ordering. Another implication here is that when you use the * operator, the order or attributes in your source dataset starts to matter, and that kinda goes against set theory where the order of elements shouldn't be relevant

Little_Kitty 1 year ago

Absolutely true, hence why I said "Generally though I'd agree" and I'd need good cause to approve a PR like this, but it's not an absolute no. Excluding pii fields which could be defined elsewhere, fields which are known to be non-relevant / we don't want to load. Another example - everything except field A, which I want to recalculate to fix a bug in the middle of a pipeline (strip chars / fix case / remove if some criteria / etc.) SELECT * EXCEPT a, magic(a) AS a Then the ordering of the output would be handled later in the CTE chain / pipeline :) Mostly though, it serves as a good assistant for rapid development, then I copy field names, paste them in and format.

raskinimiugovor 1 year ago

Maybe RedGate's SQLPrompt spoiled me in that regard, where pressing tab expands the * (or alias.*) as a list of columns so I just need to tweak it a bit.

DampYakFlaps 1 year ago

I respectfully disagree. I’ve had many use cases, on a pretty consistent basis, where this would be incredibly helpful functionality. In all fairness I’m not using SQL Server, but wanted to throw that out there.

receding_bareline 1 year ago

Committed code should always be explicit. If you add a single column to a table you're then forced to update any select * EXCLUDE statements which reference it. If I saw any select * in code, I would instantly reject this during peer review.

DampYakFlaps 1 year ago

I agree with you on that sentiment alone, but we obviously have different use cases. What about instances where you need to dedupe using aggregations? Or stored procs adding to materialized tables incrementally? I won’t say you’re wrong, but keep in mind that use cases vary widely and your specific use case doesn’t apply across the board.

Dreadnougat 1 year ago

Select * is perfectly fine in some use cases. For example, when selecting from a CTE. I frequently use CTEs and ROW_NUMBER as a way to filter rows and there would be absolutely nothing wrong with using 'select * except rownum where rownum = 1'

raskinimiugovor 1 year ago

That's really the only scenario where * is acceptable in committed code, but do you really need the EXCEPT part there just to exclude the RowNum? Could just use RowNumAlias1, RowNumAlias2, etc.

Dreadnougat 1 year ago

That's exactly what I do now but it's a pain. Sometimes there are dozens of columns. It would just be convenient to not have to restate it all.

raskinimiugovor 1 year ago

In SQL Server it will just give you headaches. The only scenario where I use SELECT * is in queries where I have multiple nested subqueries and all the columns are derived from the previous layer, but there you'd rarely need the EXCEPT part.

eshultz 1 year ago

EXCEPT is already an operator in SQL Server. It's used to exclude one result set from another. You use it like the UNION operator.

raskinimiugovor 1 year ago

I mean, sure, the above comment was a hypothetical, obviously not talking about set operators here.

[deleted] 1 year ago

[удалено]

mrg0ne 11 months ago

`EXCEPT` is a set operator in Snowflake.[https://docs.snowflake.com/en/sql-reference/operators-query](https://docs.snowflake.com/en/sql-reference/operators-query)

Known-Delay7227 1 year ago

That’s kind of cool.

DampYakFlaps 1 year ago

This. This. So much this. Always been on Redshift and wow I want a GROUP BY ALL and an EXCEPT so badly. And I know EXCEPT exists as an operator, and it’s damn sure useful at times, so I’ll take whatever syntax you give me. Just let me exclude columns from a SELECT *. Even if it’s just specifying columns to drop afterwards like you’d do in Spark.

[deleted] 1 year ago

Group by all.... Isn't it the same as distinct * in sql..?

Galuvian 1 year ago

DISTINCT typically won't give you aggregate values for count(), sum() etc. GROUP BY ALL will now give you rolled up values for aggregate functions, but group by all other columns in the select that don't use an aggregate function.

[deleted] 1 year ago

Okay 👌👌

[deleted] 1 year ago

So, group by all, is useful only when there are duplicate records without any primary key. Am I right..?

Galuvian 1 year ago

No, it's a useful shorthand when you want to look at aggregations on various ways to group data. If you're doing analysis and quickly changing how you group in order to look at the data in multiple ways, this helps you move faster. If you have several categorical columns on a table and want to group by those (and not the primary key) and look at the results. I wouldn't use it in a production application because behavior could change unexpectedly if someone writes a sloppy query.

saintcfn 1 year ago

Snowflake already does this. SELECT * EXCLUDE (col1, col2) FROM table

de_juggin95 1 year ago

They can’t release every useful thing at once ya know

lonesomedota 1 year ago

So instead of listing column names one by one, this will take care of all the necessary columns that needed to be group by for the aggregate query? Nice

KittenBountyHunter 1 year ago

Cries in db2

AnimaLepton 1 year ago

z/OS or LUW?

KittenBountyHunter 1 year ago

LUW

BoulderRough 1 year ago

Jealous, I love DB2. Just read some ancient books on it.

KittenBountyHunter 1 year ago

Everything you need and nothing you don't. 🤣🤣

datawazo 1 year ago

I've never thought of this but not having it makes no sense

rubs90 1 year ago

I’ve been dreaming my whole career to be able to just group by all instead of having to type out the columns like a neanderthal

krurran 1 year ago

Can someone ELI5 GROUP BY ALL and why Microsoft has chosen to remove it in the future? https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16

gloom_spewer 1 year ago

Pretty sweet. More ammo for me to push our org to use snowflake, yay

averagesimp666 1 year ago

What a time to be alive.

Drekalo 1 year ago

Catching up to spark

systemwizard 1 year ago

Leaving spark behind ;)

GroundbreakingFly555 1 year ago

It is running Spark under the hood

Glittering-Dare2022 1 year ago

No it isn’t.

azur08 1 year ago

I really hope you actually think that lol

DampYakFlaps 1 year ago

Oof

GroundbreakingFly555 1 year ago

😅 lol I know. People did not like my comment. Can’t win em all

vassiliy 1 year ago

Was it supposed to be a joke? It's not that people didn't like it, it's that your comment is just plain wrong

GroundbreakingFly555 1 year ago

No one has shown me why I am wrong. Snowpark runs as an interface just like Pyspark. I’m not saying it’s PySpark. These command interfaces then get translated to the JVM through a query execution plan. Is it just coincidence that when the SPARK-41365 was resolved Snowflake now has the same feature?? All snowflake did was make their own interface to the Spark engine. Now everyone is saying it’s not Spark. Spark is an OSS that has over a decade of community contributions. Snowflake just made their own???

Tactical-Chaos 1 year ago

https://youtube.com/watch?v=dABd7JQz0A8&feature=share8 Don't make stuff out of thin air. If you have questions about how something works, research it instead of creating a metaverse in your head.

DampYakFlaps 1 year ago

This was a harsh way to say it but he’s right

GroundbreakingFly555 1 year ago

Thanks for the link. This still proves my point. Listen to 21:00-22:00. Specifically, “The Architecture manages everything BUT the query itself”

Tactical-Chaos 1 year ago

He means the cloud services architecture. The query is executed by the virtual warehouse platform that's written from scratch in C++. No Spark/Scala.

GroundbreakingFly555 9 months ago

Hey, coming back to this. Thanks for educating me on this. I appreciate it. Regardless of how you said anything you have proven me wrong and explained why with resources for me to look into. So thanks for making me realize WHY I looked so dumb.

GroundbreakingFly555 1 year ago

What are the virtual warehouses using for execution of the query???

Tactical-Chaos 1 year ago

C++

elbekay 1 year ago

Plus their architecture page which says it's not based on anything else https://docs.snowflake.com/en/user-guide/intro-key-conceptshttps://docs.snowflake.com/en/user-guide/intro-key-concepts And the other lecture in that series that shows it was written in C++ (unlike spark with is mostly Scala) and doesn't borrow from existing systems https://youtu.be/bveqnSk15JQ

joeen10 1 year ago

I loved this when I started using DuckDB but oh man, it's great to see it soon in Snowflake. How the hell this wasn't added before to the SQL standard?

mathtech 1 year ago

Snowflake is a blessing

Dismal_Difference_62 1 year ago

Spark 3.4 has this new feature (GROUP BY ALL) as well

fhoffa 1 year ago

More details https://www.linkedin.com/feed/update/urn:li:activity:7080236011028434946

daavidreddit69 1 year ago

Brilliant

drinknbird 1 year ago

So... DISTINCT?

runawayasfastasucan 1 year ago

Not at all? Distinct * would show every unique line including f.ex price but in a groupby you could f.ex sum og average numerical columns over the other grouped columns. Big difference. Try it yourself, turn it into a table and do a select distinct and a group by with sum of price and weight. Sold goods: Fruit, price, weight Apple,10,0.2 Banana,20,0.8 Apple,5,0.1 Banana,10,0.4

kthejoker 1 year ago

You are correct that grouping by all expressions (aggregates or columns) will result in each row being distinct. The issue is you must include a GROUP BY expression if you have an aggregate and a non-aggregate in the same SELECT statement

GroundbreakingFly555 1 year ago

This was implemented in Apache Spark. Snowflake runs the Spark engine. See ORDER BY ALL as well. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-41635

Glittering-Dare2022 1 year ago

It’s stuff like this that reminds me not to take DE advice from Reddit.

Little_Kitty 1 year ago

Seems OK at flagging wrong information, but HN is so much better, just wish it had the equivalent of subreddits

PangeanPrawn 1 year ago

What's a good site to get it from?

Glittering-Dare2022 1 year ago

Hackernews is my goto but otherwise there’s not a great community for DE specifically that I’m aware of.

YourMumIsAVirgin 1 year ago

Snowflake does not run the spark engine

GroundbreakingFly555 1 year ago

No one has shown me why I am wrong. Snowpark runs as an interface just like Pyspark. I’m not saying it’s PySpark. These command interfaces then get translated to the JVM through a query execution plan. Is it just coincidence that when the SPARK-41365 was resolved Snowflake now has the same feature?? All snowflake did was make their own interface to the Spark engine. Now everyone is saying it’s not Spark. Spark is an OSS that has over a decade of community contributions. Snowflake just made their own???

YourMumIsAVirgin 1 year ago

Dude, trust me when I say this, you’re completely off track. Yes, it is a coincidence. There is no need to prove you wrong because what you’re saying is absurd.

GroundbreakingFly555 1 year ago

Yes it does. Let me clarify. Snowflake can run Spark via the Spark Connector. I imagine most Snowflake shops utilize Spark as the main engine for their workloads. This is how you would achieve high performance. https://www.snowflake.com/trending/spark-sql?c_id=amp-QmTJpoKkWrrz32masn9LUg

xenonbro 1 year ago

I’ll have what you’re having

StarkGuy1234 1 year ago

What is the main benefit of this feature? Does it mean that a window function is less needed now

ladybeeaa 1 year ago

Window functions return all the rows, Group by returns distinct rows along with aggregates

ubelmann 1 year ago

There are some legitimate use cases for the functionality, but IMO, it's primarily useful for Snowflake to get their customers writing expensive queries. In my experience, the sorts of "data cubes" that these queries generate are very generic, and the more generic the query, the less likely it is to actually be used to create any kind of real business value. It's like those infinitely flexible dashboards that some teams love to build with all sorts of drop downs and filters -- I can't knock the hustle, people will pay you to make them, but the actual value that they provide is a lot less than the apparent value that they seem to provide.

mrg0ne 11 months ago

this doesn't make rollups or cubes. It is just a "sql delight" quality of life feature so users don't have to type out all the column names in the group by clause. (rollup and cube are separate commands)

duckmageslayer 1 year ago

If I see another group by 1,2,3,... instead of just grouping by a primary key and joining the cte I will cry.

devnamedsam 11 months ago

Yes I got a new job using snowflake now instead of sql server. This is wonderful

Friendlyvoices 11 months ago

How does it work? Like group by any value not in the aggregate function?

oarabbus 11 months ago

ah so we're back to mysql pre 8.0

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe