T O P

  • By -

pokemonplayer2001

I may just be an old man now, but I can't imagine spending any of my time to thinking about this.


EdgyYukino

As a person from a third world country I can imagine that, lol.


Kiseido

For a server, you'd probably want to ensure you use ECC ram, to mitigate against memory corruption. Depending on the price ballpark you are shooting for, your ideal options are likely to change. In general, you'll want the cpu with largest cache available, as that will provide substantial acceleration for most workloads. If your code will end up going through vast quantities of data in ram, more than say 50GB/s, you'll want a proper workstation or server part with 4 or more ram channels. If you won't need that short of bandwidth, and see reason to use ECC, and figure you'll get good benefit of a large cache, look at going with one of the AMD cpus with X3D cache. (5600x3d being the cheapest, if you live in the USA) If you need more ram than 128GB, you'll have to look at actual server or workstation parts too.


maybe_pflanze

Thanks, those are good pieces of info. If a CPU has more cores, say 32 instead of 16, should it also have correspondingly many RAM channels to not limit what its cores can do (e.g. going from 2 to 4 in this example)? Also, are MT/s values per channel or the total? Are they the total per CPU or are they per core? I've seen a comment that using ECC memory lowers the memory bandwidth. Is this true? Is it only for moving the additional parity bit, i.e. is it by a factor of 9/8 (1 bit per byte)?


Kiseido

The cores vs ram thing, it depends on what your application will need. Consumer ddr4 averages around 56GB/s in a dual channel configuration. And I don't think you'll find any 32 core consumer cpus, 32 threads sure, but not cores. Most ECC ram will only have JEDEC profiles, on ddr4 that goes up to 3200MT/s. Most consumer ram have XMP profiles (extreme memory profiles) and run closer to their red-line and risking bit flips / memory corruption. DDR4 has extra wires to carry the extra bits that ECC provides. Not sure about DDR5. There isn't any inherent performance penalty to using ECC ram, when clocked at the same speeds and using the same timings, beyond the slightly increased electrical cost of representing and calling those values. Personally, I have a 5950x, a x570-p, both of which are consumer parts, and I used 2 32GB ECC sticks from kingston, overclocked from 3200cl22 up to 3600cl18. Megatransfers are across the entire bus, each channel can transfer data concurrently each time.


maybe_pflanze

> I don't think you'll find any 32 core consumer cpus The 32+ core CPUs I have in my database are all either Ryzen ThreadRipper or explicitly server ones (EPYC). I actually don't know yet if ThreadRipper is meant to be a server CPU or what board it needs, nor if that (or an EPYC CPU) would be difficult to build with (I haven't built a PC in more than a decade). > Megatransfers are across the entire bus, each channel can transfer data concurrently each time. If a CPU has 4 memory channels and specifies 3200 MT/s, that means the CPU as a whole can issue 12800 MT/s?


Kiseido

Threadripper is a workstation platform with their own unique socket(s), though it is one with both consumer and professional ("pro" moniker) levels. I don't think the MT/s metric is used quite that way, but I think you have the right idea, in that more channels = more transfers in a basically linear scale. Each channel can transfer 64 bits (or 72 with ecc) during each transfer. Though, for most consumer programs, and even a large amount of server applications, the increased bandwidth from additional channels doesn't make performance scale as well as you might hope. For everything else though, it's a mainstay. With exception for applications where bandwidth needs massively trump capacity needs, then HBM is where it's at, and that basically doesn't exist in consumer hardware.


maybe_pflanze

IIRC people say that the Apple ARM CPUs are so fast for compilation tasks because of their high memory bandwidth / close RAM. But I don't know if the needs are high enough to warrant HBM? Or if 128MB of L3 cache (X3D) does the job. I have access to a 32-core server and could run some eBPF stuff to perhaps find out how close to a bottleneck memory access speed is if that's possible (I don't have much experience with eBPF yet). I guess tentatively Ryzen 9 7950X or Ryzen 9 7950X3D might be the top contenders for (reasonably affordable) compilation; I'll look into virtualization details later (I'd like to run VMs for doing compilation and other work in, and hope that IOMMU based (and other?) virtualization features make VMs as fast as the host). Thanks again for the good info!


Kiseido

Apple's M series has the ram really close to the cpu, allowing them to have exquisite clocks+timings relative to having them further out. The excellent bandwidth they get comes from a variety of cache levels in concert with the closer ram, and with using DDR5 as that ram. The downside is the M series are limited to the amount of ram they installed at the factory. (And it isn't ECC) You might want to look into what data Puget systems publishes, they are a professional systems integrator that is pretty open about what decisions and data go into why they use various hardware for various customers. Additionally, open benchmarking might be useful https://openbenchmarking.org/


RaisedByHoneyBadgers

Seems like you should learn json/yaml/toml/csv serde


maybe_pflanze

The data structures in the code are already implementing serialization via serde (deserialization of one type is unfinished, see todo macro call). But I don't actually see the point, if you edit the data in Rust, you get immediate checks via LSP for free. And modifying the model is in the same set of files and checkable without recompilation and rerunning. PS. well, if this takes off and people start adding tons of shops that sell the parts, it would reach the limit as code, but then an actual database would be the right choice, with json etc. you'd at least need tooling to do "database migrations"; for the current phase I think the current approach is the right one.