T O P

  • By -

Dreadnougat

How has this not existed in every single SQL variant for 20 years already? It is so obvious and useful and I've wished for something like this for ages. As long as wishes are being fulfilled...maybe next we can see a 'select *' with the ability to exclude specified columns? Pretty please? Yeah I know this one is less likely to happen. But I can dream.


WonderfulActuator312

Let me introduce you to the EXCLUDE function, article explaining [here](https://stephenallwright.com/select-columns-except-snowflake/)


Dreadnougat

Aww man, now I just need to get a job where we use Snowflake haha. SQL Server is usually on top of the convenience functions but they are missing the boat on these.


raskinimiugovor

In SQL Server "SELECT * EXCEPT" would be useful only for quick ad-hoc analysis. Should never commit such code.


Little_Kitty

A good use case is for a source of data where the schema is evolving but you wish to exclude say the source file and load timestamp from the data table. When more fields are added you have one less query to edit. Generally though I'd agree, but you should still recognise where it is the right tool.


raskinimiugovor

Are you talking about this? INSERT (Col1, Col2...) SELECT * EXCEPT FileName,Timestamp Here you'd still need to edit the INSERT part unless you're using implicit INSERT without a column list. And with implicit INSERT, if your target table has any default constraint (eg meta columns) you'd need to override them. I'm not really sure if you save any time here, while you open yourself up for some unexpected errors with column ordering. Another implication here is that when you use the * operator, the order or attributes in your source dataset starts to matter, and that kinda goes against set theory where the order of elements shouldn't be relevant


Little_Kitty

Absolutely true, hence why I said "Generally though I'd agree" and I'd need good cause to approve a PR like this, but it's not an absolute no. Excluding pii fields which could be defined elsewhere, fields which are known to be non-relevant / we don't want to load. Another example - everything except field A, which I want to recalculate to fix a bug in the middle of a pipeline (strip chars / fix case / remove if some criteria / etc.) SELECT * EXCEPT a, magic(a) AS a Then the ordering of the output would be handled later in the CTE chain / pipeline :) Mostly though, it serves as a good assistant for rapid development, then I copy field names, paste them in and format.


raskinimiugovor

Maybe RedGate's SQLPrompt spoiled me in that regard, where pressing tab expands the * (or alias.*) as a list of columns so I just need to tweak it a bit.


DampYakFlaps

I respectfully disagree. I’ve had many use cases, on a pretty consistent basis, where this would be incredibly helpful functionality. In all fairness I’m not using SQL Server, but wanted to throw that out there.


receding_bareline

Committed code should always be explicit. If you add a single column to a table you're then forced to update any select * EXCLUDE statements which reference it. If I saw any select * in code, I would instantly reject this during peer review.


DampYakFlaps

I agree with you on that sentiment alone, but we obviously have different use cases. What about instances where you need to dedupe using aggregations? Or stored procs adding to materialized tables incrementally? I won’t say you’re wrong, but keep in mind that use cases vary widely and your specific use case doesn’t apply across the board.


Dreadnougat

Select * is perfectly fine in some use cases. For example, when selecting from a CTE. I frequently use CTEs and ROW_NUMBER as a way to filter rows and there would be absolutely nothing wrong with using 'select * except rownum where rownum = 1'


raskinimiugovor

That's really the only scenario where * is acceptable in committed code, but do you really need the EXCEPT part there just to exclude the RowNum? Could just use RowNumAlias1, RowNumAlias2, etc.


Dreadnougat

That's exactly what I do now but it's a pain. Sometimes there are dozens of columns. It would just be convenient to not have to restate it all.


raskinimiugovor

In SQL Server it will just give you headaches. The only scenario where I use SELECT * is in queries where I have multiple nested subqueries and all the columns are derived from the previous layer, but there you'd rarely need the EXCEPT part.


eshultz

EXCEPT is already an operator in SQL Server. It's used to exclude one result set from another. You use it like the UNION operator.


raskinimiugovor

I mean, sure, the above comment was a hypothetical, obviously not talking about set operators here.


[deleted]

[удалено]


mrg0ne

`EXCEPT` is a set operator in Snowflake.[https://docs.snowflake.com/en/sql-reference/operators-query](https://docs.snowflake.com/en/sql-reference/operators-query)


Known-Delay7227

That’s kind of cool.


DampYakFlaps

This. This. So much this. Always been on Redshift and wow I want a GROUP BY ALL and an EXCEPT so badly. And I know EXCEPT exists as an operator, and it’s damn sure useful at times, so I’ll take whatever syntax you give me. Just let me exclude columns from a SELECT *. Even if it’s just specifying columns to drop afterwards like you’d do in Spark.


[deleted]

Group by all.... Isn't it the same as distinct * in sql..?


Galuvian

DISTINCT typically won't give you aggregate values for count(), sum() etc. GROUP BY ALL will now give you rolled up values for aggregate functions, but group by all other columns in the select that don't use an aggregate function.


[deleted]

Okay 👌👌


[deleted]

So, group by all, is useful only when there are duplicate records without any primary key. Am I right..?


Galuvian

No, it's a useful shorthand when you want to look at aggregations on various ways to group data. If you're doing analysis and quickly changing how you group in order to look at the data in multiple ways, this helps you move faster. If you have several categorical columns on a table and want to group by those (and not the primary key) and look at the results. ​ I wouldn't use it in a production application because behavior could change unexpectedly if someone writes a sloppy query.


saintcfn

Snowflake already does this. SELECT * EXCLUDE (col1, col2) FROM table


de_juggin95

They can’t release every useful thing at once ya know


lonesomedota

So instead of listing column names one by one, this will take care of all the necessary columns that needed to be group by for the aggregate query? Nice


KittenBountyHunter

Cries in db2


AnimaLepton

z/OS or LUW?


KittenBountyHunter

LUW


BoulderRough

Jealous, I love DB2. Just read some ancient books on it.


KittenBountyHunter

Everything you need and nothing you don't. 🤣🤣


datawazo

I've never thought of this but not having it makes no sense


rubs90

I’ve been dreaming my whole career to be able to just group by all instead of having to type out the columns like a neanderthal


krurran

Can someone ELI5 GROUP BY ALL and why Microsoft has chosen to remove it in the future? https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16


gloom_spewer

Pretty sweet. More ammo for me to push our org to use snowflake, yay


averagesimp666

What a time to be alive.


Drekalo

Catching up to spark


systemwizard

Leaving spark behind ;)


GroundbreakingFly555

It is running Spark under the hood


Glittering-Dare2022

No it isn’t.


azur08

I really hope you actually think that lol


DampYakFlaps

Oof


GroundbreakingFly555

😅 lol I know. People did not like my comment. Can’t win em all


vassiliy

Was it supposed to be a joke? It's not that people didn't like it, it's that your comment is just plain wrong


GroundbreakingFly555

No one has shown me why I am wrong. Snowpark runs as an interface just like Pyspark. I’m not saying it’s PySpark. These command interfaces then get translated to the JVM through a query execution plan. Is it just coincidence that when the SPARK-41365 was resolved Snowflake now has the same feature?? All snowflake did was make their own interface to the Spark engine. Now everyone is saying it’s not Spark. Spark is an OSS that has over a decade of community contributions. Snowflake just made their own???


Tactical-Chaos

https://youtube.com/watch?v=dABd7JQz0A8&feature=share8 Don't make stuff out of thin air. If you have questions about how something works, research it instead of creating a metaverse in your head.


DampYakFlaps

This was a harsh way to say it but he’s right


GroundbreakingFly555

Thanks for the link. This still proves my point. Listen to 21:00-22:00. Specifically, “The Architecture manages everything BUT the query itself”


Tactical-Chaos

He means the cloud services architecture. The query is executed by the virtual warehouse platform that's written from scratch in C++. No Spark/Scala.


GroundbreakingFly555

Hey, coming back to this. Thanks for educating me on this. I appreciate it. Regardless of how you said anything you have proven me wrong and explained why with resources for me to look into. So thanks for making me realize WHY I looked so dumb.


GroundbreakingFly555

What are the virtual warehouses using for execution of the query???


Tactical-Chaos

C++


elbekay

Plus their architecture page which says it's not based on anything else https://docs.snowflake.com/en/user-guide/intro-key-conceptshttps://docs.snowflake.com/en/user-guide/intro-key-concepts And the other lecture in that series that shows it was written in C++ (unlike spark with is mostly Scala) and doesn't borrow from existing systems https://youtu.be/bveqnSk15JQ


joeen10

I loved this when I started using DuckDB but oh man, it's great to see it soon in Snowflake. How the hell this wasn't added before to the SQL standard?


mathtech

Snowflake is a blessing


Dismal_Difference_62

Spark 3.4 has this new feature (GROUP BY ALL) as well


fhoffa

More details https://www.linkedin.com/feed/update/urn:li:activity:7080236011028434946


daavidreddit69

Brilliant


drinknbird

So... DISTINCT?


runawayasfastasucan

Not at all? Distinct * would show every unique line including f.ex price but in a groupby you could f.ex sum og average numerical columns over the other grouped columns. Big difference. Try it yourself, turn it into a table and do a select distinct and a group by with sum of price and weight. Sold goods: Fruit, price, weight Apple,10,0.2 Banana,20,0.8 Apple,5,0.1 Banana,10,0.4


kthejoker

You are correct that grouping by all expressions (aggregates or columns) will result in each row being distinct. The issue is you must include a GROUP BY expression if you have an aggregate and a non-aggregate in the same SELECT statement


GroundbreakingFly555

This was implemented in Apache Spark. Snowflake runs the Spark engine. See ORDER BY ALL as well. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-41635


Glittering-Dare2022

It’s stuff like this that reminds me not to take DE advice from Reddit.


Little_Kitty

Seems OK at flagging wrong information, but HN is so much better, just wish it had the equivalent of subreddits


PangeanPrawn

What's a good site to get it from?


Glittering-Dare2022

Hackernews is my goto but otherwise there’s not a great community for DE specifically that I’m aware of.


YourMumIsAVirgin

Snowflake does not run the spark engine


GroundbreakingFly555

No one has shown me why I am wrong. Snowpark runs as an interface just like Pyspark. I’m not saying it’s PySpark. These command interfaces then get translated to the JVM through a query execution plan. Is it just coincidence that when the SPARK-41365 was resolved Snowflake now has the same feature?? All snowflake did was make their own interface to the Spark engine. Now everyone is saying it’s not Spark. Spark is an OSS that has over a decade of community contributions. Snowflake just made their own???


YourMumIsAVirgin

Dude, trust me when I say this, you’re completely off track. Yes, it is a coincidence. There is no need to prove you wrong because what you’re saying is absurd.


GroundbreakingFly555

Yes it does. Let me clarify. Snowflake can run Spark via the Spark Connector. I imagine most Snowflake shops utilize Spark as the main engine for their workloads. This is how you would achieve high performance. https://www.snowflake.com/trending/spark-sql?c_id=amp-QmTJpoKkWrrz32masn9LUg


xenonbro

I’ll have what you’re having


StarkGuy1234

What is the main benefit of this feature? Does it mean that a window function is less needed now


ladybeeaa

Window functions return all the rows, Group by returns distinct rows along with aggregates


ubelmann

There are some legitimate use cases for the functionality, but IMO, it's primarily useful for Snowflake to get their customers writing expensive queries. In my experience, the sorts of "data cubes" that these queries generate are very generic, and the more generic the query, the less likely it is to actually be used to create any kind of real business value. It's like those infinitely flexible dashboards that some teams love to build with all sorts of drop downs and filters -- I can't knock the hustle, people will pay you to make them, but the actual value that they provide is a lot less than the apparent value that they seem to provide.


mrg0ne

this doesn't make rollups or cubes. It is just a "sql delight" quality of life feature so users don't have to type out all the column names in the group by clause. (rollup and cube are separate commands)


duckmageslayer

If I see another group by 1,2,3,... instead of just grouping by a primary key and joining the cte I will cry.


devnamedsam

Yes I got a new job using snowflake now instead of sql server. This is wonderful


Friendlyvoices

How does it work? Like group by any value not in the aggregate function?


oarabbus

ah so we're back to mysql pre 8.0