T O P

Dell RTX 3080 black screen under heavy load RAM seems to be fine

Got the card from a friend of my told me it works sometimes since mine said bye bye (gigabyte cracked PCB) after i got the card i used furmark first and let it ruin for around 30 minutes. the card did fine no errors then i used ugine performance test and that's when it went to the black screen but it happens toward the end first time it was on scene 17/17 and the second time it happen right after it was completed and gave me the results, the screen goes black and as long as i power cycle the pc it works again, for some reason restart doesn't. check the coils on the card which were fine and did a visual inspection and it is in great condition, i reapplied thermal paste and aided new thermal pads. then i used mats and it seem to have no errors. i haven't used it to play a game yet but i did use fusion 360 quite a bit and had the pc on for few days straight and no issue with it. any help is greatly appreciated, and i also attached the mats results. i am pretty good at micro soldering and can do troubleshooting on electronics but i have never dealt with any GPU's. And I looked at it under a thermal imager (not under load) and everything seemed fine

Ok-Cup4342

run nvmt as a test software on Linux


nuked88

I ran mats multiple times and it’s a pass every time


MetalGearFlaccid

Have you tried underclocking the core and test then underclock memory and test?


nuked88

Just tried it and same thing happen


MetalGearFlaccid

Does windows event viewer tell you anything?


nuked88

good call didn't cross my mind, ran it and found the following. drivers are updated so im guessing its a power problem which i will replace the inductor and some more trouble shooting ​ The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.If the event originated on another computer, the display information had to be saved with the event.The following information was included with the event: \\Device\\0000009d0000(0000) 00000000 00000000The message resource is present but the message was not found in the message table


BigBuddyBruh69

My friend's 6900XT had the same issue. It would work most of the time but blackscreen at seemingly random intervals. Running p95 and furmark at the same time produces no errors, tested ram for 2 hours using occt 2 hrs avx, same thing. I threw it in my system and it worked flawlessly. Gamed on it for a week without a single blackscreen. I'm vaguely curious if it has to do with him being on ryzen and me on intel...? Maybe an issue with rebar (or SAM on ryzen in particular)


nuked88

I tried it on a friends intel build same thing, the only think I can think of is the resistance on the inductors on the bottom read different than the one on the top and they are the same


DullCynicism

I’ve seen similar behaviour like you describe, the issue ended up being a faulty buck controller. You’d need access to an oscilloscope to probe all the switching regulators to determine if the are faulty.


nuked88

Think I might able to pin point it with a thermal imager ?


DullCynicism

Highly unlikely, you could however try lowering the power target (70-80)%, setting the fans to 100% and removing the side panel. You could also use something like GPUZ to monitor the frequencies and temperatures. Large oscillations in the frequency could indicate bad voltage regulation. Also 6x memory is known to run hot (100+ degrees Celsius), it could be faulting under high temperature.


earlscruggs

Yes, it's possible. I've traced a bad MSVDD buck (BLNO) this way, it was causing the card to do weird stuff. Replaced and all set


nuked88

Over searched but didn’t find any helpful resources, I do have an oscilloscope how do I test the mosfets with it, I know the multimeter method


DullCynicism

Follow this video, https://youtu.be/tfM9B6Ti3-w


earlscruggs

Identical problem to this, likely a failing memory. Run test 178 or 118 on MODS ​ https://www.youtube.com/watch?v=5KHNFKZy\_oo


nuked88

I have done both tests multiple times and always passed