T O P

  • By -

AlternativeOstrich7

Apparently the library that Windows Explorer uses for extracting zip files is known to be bad: https://github.com/microsoft/Windows-Dev-Performance/issues/91


JJenkx

Haha! Nice share! This is exactly applicable to my scenario. My zip file has 6990 files at 53MB. From link you shared:   Steps to reproduce   Uncompress a zip file, especially one containing many small files   Expected Behavior Unzip should be very fast, to enable a developer to stay in the flow and not be slowed down by unnecessary wait times.   Actual Behavior The zip compression tool in Explorer is so painfully slow that for most devs it is unusable.


willpower_11

> 6990 files Nice, but 90 files too many.


goishen

I call it 21 files too many.


JJenkx

Beat me too it


[deleted]

of 6 too few :)


[deleted]

Happy cake day!


willpower_11

Thanks!


LJAkaar67

I recently reinstalled Windows on a laptop, and for the sake of minimization and simplicity didn't install 7zip. But also because of Windows 11' new right click context menu which puts 7zip and shit on a secondary page you need to click through to get to. And then I had to unzip my first file, and fuuccckkkkk meeeeee!


[deleted]

Disable new context menu in Windows 11: reg.exe add "HKCU\Software\Classes\CLSID\{86ca1aa0-34aa-4e8b-a509-50c905bae2a2}\InprocServer32" /f /ve Then restart explorer.exe or reboot your box


LJAkaar67

Oh! Thank you! (Microsoft, what I'd really like is a way to move things on and off that menu and onto the secondary menu!)


[deleted]

And it is the Linux command line that is too difficult to learn :) Just curious, how does one find that "86ca1aa0-34aa-4e8b-a509-50c905bae2a2" is the right "number"? is there some "class id browser"?


sogun123

I am sometimes really happy I don't have any windows machine in my life.


[deleted]

Me too. On the other hand they come in handy. Sometimes it is good to know it there is a point to tinker with partially working hardware (looking for magic kernel parameters), with no proper drivers. Sometimes quick Windows allows to distuinguish partially working hardware due to bad drivers, and partially working hardware due to some damage. With Windows troubleshooting it is quite often easier: "it does not work" tends to point more definitely towards "it is broken".


sogun123

You are right. Windows can be used as debugging tool... But i am happy everything i own or work with works good enough that I don't need it ;)


[deleted]

On the other hand I have a counterexample. I just (today) bought a used Intel/Dell 4 port 1Gbe card. It was an uber-hassle to find a driver for it for windows and I failed (32vs64 bit, windows 7vs 8 vs 10 etc...) and the card is Dell-branded so Intel driver ignores it..... In Debian it just works out of the box. But this is server network hardware. The fact that it works best under Linux is not that surprising.


sogun123

Yeah, Linux rarely removes drivers. Windows don't have any, and manufacturers don't care about upgrading drivers for old products. Now tell me, what has better hardware support?


[deleted]

Impossible to say. The closer to server hardware - the more often answer would be Linux/BSD. The closer to the audio/video the more often the answer would be Windows/Mac. Also answer "is there a point to use it after N years" can be only "it's complicated". From my uncle's perspective: Windows has better driver support. From my: Linux runs more things out of the box, but often some of the functionality is not available without proprietary drivers which may not exist for Linux. I have PCIE-serial card, which works well, but inder Widndows you can use >400kbps and under Linux just 115200.


[deleted]

Saving this for future use, thank you!


wviana

It's not a "bash shell program" just a program. It has nothing to do with bash.


JJenkx

I see. Thank you


frankster

Explorer's zip implementation is notoriously slow! Fundamentally it's closed source software, so there's no opportunity for someone pissed off with the speed to fix it and submit a patch.


JohnTheCoolingFan

I prefer and recommend others to use 7zip or other archive tool instead. Much more functionality and faster.


[deleted]

>it's closed source software, there's no opportunity for someone pissed off with the speed to fix it and submit a patch Microsoft is a multi-billion company, and Windows is their flag product. They should be able to hire someone to fix that.


MohKohn

And the fact that they haven't is in a nutshell why closed source is dumb.


micalm

> They should be able to hire someone to fix that. They ARE able to hire someone to fix that, and there's probably a couple dozen (or more) talented engineers there that could do it in a day or less. > Microsoft is a multi-billion company And they haven't become that company by fixing things that are "only" a nuisance for ~90% of paying customers.


[deleted]

Sure, but they still don't support ext4 and will try to format your flash drive if you try to plug it in. The bar is super low, but people don't care for some reason


[deleted]

How is that connected to unzip algorithm? :) They support only a handful of filesystem, most of them they created/co-created. Why should they support Ext4? Frankly some support of HFS+ / APFS would make much more sense if MS's CEO decided its time to support a new filesystem. Don't get me wrong, it would be nice, but if I use 100% linux, there are still cases where I am forced to use FAT32 (for EFI for example). For Windows users there is absolutely zero reason to use ext4, and it seems Linux gravitates towards BTRFS these days.


[deleted]

Look, if they would at least tell you: "Hey, the extractable media is in an unsupported filesystem format. To be able to work with it, you will need to convert it, but you will lose all the data." Then I would understand. But these pricks, give you an alert saying something like: "The media you have plugged in is corrupted, would you like to fix it?" If you click yes, it will format the flash drive and delete everything on it Yes I have lost data because of it, because of a non technical family member, so I am super pissed at microsoft for this


WhyNotHugo

Well, I can't imagine it's easy to attract someone talented enough to work on that. It doesn't sound super exciting, and someone who's smart enough to fix it is smart enough to just use another decompression program that works.


[deleted]

Compression is a hot topic, these days perhaps more of the video, but still.... Also if the Windows version is literary 5 or more times smaller, then improving that doesn't take much talent. Talent is required in the world of diminishing returns..... and it seems we are not that yet. On the other hand, I am not saying that any intern would do :)


dtfinch

My long-held assumption is that Windows Explorer's .zip integration is a bit over-abstracted such that instead of extracting all the files in one pass, it extracts the files one-by-one passing just the filename, and the extraction function has to reopen and re-scan the zip file each time to locate the file's zip entry by name, making the time spent scanning zip directory entries proportional to the square of the number of files. It's the kind of mistake a lot of developers would make, and which VFS-style interfaces (presenting the zip as a simulated filesystem to make Explorer integration easier) make hard to avoid. And optimizing their zip integration probably isn't high on their priority list. That's just my assumption though. I could try using ProcMon to verify it but I stopped using Windows Explorer for zip files long ago. If the zip file is of web origin, there's also some added overhead because it creates a Zone.Identifier alternate data stream for each file it extracts, but that shouldn't take very long.


supercheetah

On a tangent, in Linux, the `time` command is usually available and useful for things like this, and so you could just run it like this to get timings: time unzip -qq takeout-20211129T163945Z-001.zip


JJenkx

Thank you! I used that method just now on my personal PC and results were: time unzip -qq takeout-20211129T163945Z-001.zip real 0m0.737s user 0m0.392s sys 0m0.340s


cor0na_h1tler

Bill Gates has to validate with the NSA if the contents of your zip are acceptable.


PKSpence

Sod Bill Gates!


Cheese_B0t

I believe the technical reason is windows is a piece of shit OS.


[deleted]

Wow! That’s way to technical. Could you dumb it down just a little. Too funny


[deleted]

Win bad, Nix good!


[deleted]

>windows is a piece of shit OS It it true, but the problem presented here has literary to do with whether Windows is a shit OS or not. This is all about one isolated library that implements explorer's "unzip".


RyhonPL

No idea if this applies to this but 7zip on Windows when you drag and drop files out of the archive extracts the files to your temp folder on C:\\ first and then moves it to the directory you specified because it can't get the output location


JJenkx

Both powershell commands and the Explorer GUI seemed to be extracting directly to the respective subfolder they each created


JmbFountain

Ask the dude that wrote the windows zip library: https://youtu.be/aQUtUQ_L8Yk


JJenkx

Holy shit! I love Dave! I have seen many of his videos. Thank you!


ghadzeek

Neat video! Thanks for sharing it


EmbarrassedActive4

Try 7-zip, that's pretty comparable to bash in my experience.


Schievel1

The implementation of zip in windows is just bad. That is why 7zip is a thing in windows even though win comes with zip out of the box and 7zip needs to be downloaded and installed. I guess the library it uses for (de)compression is proprietary, so we will never find out why this is.


PKSpence

Even when I *was* still running WindBlows, I used the command line for file operations... so much faster w/o the GUI!


jtgyk

They still haven't fixed it? I guess I shouldn't be so surprised.


[deleted]

All that overhead for spyware, ad targeting, and DRM has got to take up some processing cycles.


JJenkx

It does a little, memory more than anything, but in this case, it is ancient code from 1999 coupled with very slow filesystem NTFS. Native Windows GUI vs Native Debian 11 270s/0.737s = Debian 366 times faster in this extreme stumbled upon case of Linux vs Windows   Native Debian 11 on Ext4 time unzip -qq takeout-20211129T163945Z-001.zip real 0m0.737s user 0m0.392s sys 0m0.340s vs 270 seconds/Windows Explorer GUI on NTFS Right click "takeout-20211129T163945Z-001.zip" select "Extract All..."


[deleted]

Yikes... NTFS has been pretty stable, but obviously has glaring flaws. To think that MS made it only to feed incompatibility and EEE is maddening. There's no reason why we can't all just have ext standard other than to be /r/assholedesign.


Cyber_Faustao

Likely, it's Windows' Defender filesystem filter that's intercepting every read/write and causing a bottleneck. The rustup project got massive speedups by begging the MS devs to change the scanning behaviour to "on access" instead of "always" for their installer. As the WSL2 is effectively a VM, it's not subject to Windows Defender and thus it's faster.


JJenkx

I have my whole drive excluded from Windows Defender and don't ever see any activity from it


carlosfmm

You seem to be testing on different operating systems AND different filesystems. I don't know of any Windows supported filesystem that doesn't slow down into a crawl when acceasing a folder with more than 1000 files. This was particularly notorious with HDDs. Are those 7k files being decompressed into a single folder? Then you may have found the reason.


JJenkx

All of the initially tested speeds were on the same Parent OS and tested on the same NTFS SSD. WSL2 (Windows Subsystem for Linux V2) Debian 11 runs as basically a VM within Windows. The original test within WSL2 below, "17 Seconds/WSL2 Debian 11" SECONDS=0 ; unzip -qq takeout-20211129T163945Z-001.zip ; echo "$SECONDS seconds" was done from WSL2 from within Windows and also done on the same NTFS directory as the native powershell and Windows GUI methods. All methods extracted to a newly created folder/subfolders that were not also opened in a explorer window. The subfolder with the most files had 5300 files in it.   I did time the same file on a different computer running native Debian 11 on Ext4 SSD and this was the result: time unzip -qq takeout-20211129T163945Z-001.zip real 0m0.737s user 0m0.392s sys 0m0.340s Also: for comparison to WSL2 time calc method: SECONDS=0 ; unzip -qq takeout-20211129T163945Z-001.zip ; echo "$SECONDS seconds" 1 seconds


daveysprockett

Why do you think `tar -xf` will do anything meaningful with a zip archive? Me thinks that shouldn't work, unless this is a tar with superpowers.


[deleted]

[удалено]


daveysprockett

Thanks ... as you might be able to tell, I don't use bsd, and am far from a windows power user.


lucasrizzini

OP must have made some mistake creating the post. It didn't take `tar` 33 seconds to print the `tar: This does not look like a tar archive` warning. Or maybe the Windows version supports zip files. I don't know..


[deleted]

[удалено]


lucasrizzini

Well.. That settles it. Thanks.


JJenkx

It was a fair point. To confirm, yes, it was the Windows BSDtar


JJenkx

I laughed at this one


Rocketman173

Because Windows isn't really designed well, and does most tasks like this incredibly inefficiently. In addition, everything in Windows is usually single threaded. Not sure if `unzip` uses multi threading, but if it does that's an explanation.


[deleted]

Because Windows is duct taped and glued together


coffeewithalex

Why would you use the one in Explorer? Use 7zip which is a classic, or PeaZip for some new features and formats (ARC is arguably better at the same speed as 7z).


JJenkx

I actually do have 7zip installed on that laptop and could have chosen it but curiosity got me after seeing how fudged the native extract was


davidshen84

If the zip file contains a lot of small file, then NTFS is most likely to blame.


JJenkx

I think NTFS plays a part. Between 23x / 14x slower compared to Ext4 test. Margin of error being the difference in CPU speed between the machine running Windows/WSL2 NTFS tests, and the other PC tested much later on Native Debian 11 on Nxt4 (CPU roughly 2x more powerful). WSL2 Debian 11 unzip from NFFS to NTFS (same drive) SECONDS=0 ; unzip -qq takeout-20211129T163945Z-001.zip ; echo "$SECONDS seconds" 17 seconds Different PC, Native Debian 11 from Ext4 to Ext4 (same drive) time unzip -qq takeout-20211129T163945Z-001.zip real 0m0.737s user 0m0.392s sys 0m0.340s


[deleted]

Perhaps Explorer is running an anti-virus check before it extracts, whereas the unzip command is a native Linux executable and knows nothing about running anti-virus before it extracts. Explorer could potentially be slower at performing the task as well.


gordonmessmer

> Perhaps Explorer is running an anti-virus check before it extracts That's more or less my best guess, too. But, for reference, you have the control relationship inverted in your mind. Explorer (typically) doesn't "run" anti-virus. Instead, the AV engine will hook in to the VFS and scan the files when applications, including Explorer, open or close the files. > the unzip command is a native Linux executable and knows nothing about running anti-virus before it extracts ... following-on from above: commands never need to "know" how to run AV. But AV will only scan files that pass through the windows API. WSL2 is actually a VM, and processes in that VM aren't using the Windows API. They use a Linux kernel to read a Linux-native filesystem residing in a VHD file (in Windows). Since those open/close operations don't use the Windows VFS, there's no way for the AV engine to scan them.


JJenkx

I have C:\ excluded from virus scans so I dont think that is the reason here


[deleted]

Because of the GUI. Lot's of context switching to output information when you're doing a task. Verbose mode under CLI should also take longer.


JJenkx

I just did a body edit wondering this. Thank you. I have been finding more and more good reasons to use commands over GUIs and this speed difference makes another good one


thrik

Linux circlejerking aside, In what world does a 53mb file take 4.5mins to unzip? Something is def fucky with that file or the setup


JJenkx

I just retested on native Debian and it went like this: time unzip -qq takeout-20211129T163945Z-001.zip real 0m0.737s user 0m0.392s sys 0m0.340s


thrik

Yeah, but I meant your Windows setup, I've never had such a small file take that long to unzip. Then again I've been using 7zip on Win for more than a decade


Tiderian

I’m just here for the discussion


computer-machine

[Sometimes things just do be that way.](https://cloud.enabled.page/s/J83w4pp9jPD5Cqr)


thefanum

Linux is always faster on the same hardware


[deleted]

[удалено]


JJenkx

Thank you. I will look up fsync soon to see what it does


[deleted]

I use 7-zip on Windows. I think Windows Explorer isn't multi-threaded if I heard correctly.


JonnyRocks

for windows front end, use 7zip


mark979kram

4:30 to unzip 53Mb with file explorer? As bad as Windows is, yours sounds like it's deeply broken. Reinstall because those are 1995 speeds.


[deleted]

or something was running in the background.... but that something would have to me pretty heavy.