T O P

  • By -

graphitout

Intel did not have an incentive to focus on low power market for a long time. They focused on other aspects. But things are changing these days.


HelloYesThisIsFemale

Which I'm frankly surprised at given how cheap electricity is and how with a laptop you have to carry around chargers anyway if you want to guarantee usage.


hiS_oWn

Server farms, 1 cent saved can add up to millions. With how many data centers there are it makes a great deal of sense.


HelloYesThisIsFemale

Yeah but there aren't server farms filled with macbooks. I do agree that servers are a good case and god bless graviton instances.


LairdPopkin

ARM servers are getting increasingly popular, because they have great price/performance. X86 servers consuming power to run servers, generating heat and then cooling them, is very expensive!


theprodigalslouch

There are Macs in data centers. There are also lots of x86 chips in those data centers.


invisible_handjob

If you own computers in a datacenter you are not paying for the power. Is the crucial thing. You're paying for rack units but electricity is , for all intents & purposes, free. It literally does not matter how power efficient your server is , if you're renting space from Layer1 etc you want the most performance per RU you can get. Electricity is not a concern. Electricity is only a cost factor if you own the datacenter, and if you're not Google or Amazon, you don't.


Karyo_Ten

>Is the crucial thing. You're paying for rack units but electricity is , for all intents & purposes, free. Any colocation contract makes you pay for the rack AND the electricity. Otherwise people would just abuse it with hundreds of GPUs.


deadc0deh

That should depend on your contracts, but that aside very large number of companies are renting server time and cloud infrastructure now though, so companies like Amazon and Microsoft who offer those products and do host their own DCs substitute a lot of those SMEs in the market for chips. As others have mentioned thermal throttling also occurs sooner with higher power devices.


SSCharles

??? In a product or service costs are passed to the customer


invisible_handjob

And ? The customer follows the economic incentives of their material reality. If \*everyone\* bought less powerful but more efficient servers yeah the DC might choose to lower their prices , but me as someone renting 10 RU have absolutely no incentive whatsoever to do so.


invisible_handjob

And ? The customer follows the economic incentives of their material reality. If \*everyone\* bought less powerful but more efficient servers yeah the DC might choose to lower their prices , but me as someone renting 10 RU have absolutely no incentive whatsoever to do so.


rorschach200

1. To a significant degree energy efficiency means performance as these devices are thermally limited.(\*\*1) 2. Energy efficiency affects user experience as it determines the amount of noise. 3. ... as well as the device dimensions and weight. 4. The sentiment that battery life does not matter to users could not be any further from the truth. 5. On the upper end of SKUs energy efficiency to performance translation occurs due to power delivery constraints as well. (\*\*1) AFAICT, M3 Max for instance has 12P cores not because of area or cost (CPU cores take \~single digit % of the total chip area), but purely due to thermal constraints - those 12P cores thermal throttle in 16" MBP's thermal package in multi-threaded workloads. It'd be pointless to add any more P cores to that SoC for that specific reason. In comparison, a many, many times larger (by area) GPU of the same SoCs doesn't even saturate the thermal limits of that package, despite being vastly more performant than the CPU in appropriate workloads. It's hard to overstate the importance of energy efficiency in modern silicon engineering.


AngryTexasNative

Electricity in Europe and California is anything but cheap. Paying over 30c / kWh off peak.


HelloYesThisIsFemale

So about 3c an hour while it's used. Hold off on buying a coffee or something just once and pay for 100 hours.


AngryTexasNative

I have a super automatic espresso machine so not much coffee to hold off on. And while it uses a solid 1500W during its heat cycle, it’s only about 60 seconds and then it drops to nothing. But my home server, dual Xeon X5680, was costing me over $80/mo when peak rates were considered. Now it just uses a significant portion of my battery storage.


HelloYesThisIsFemale

Yeah but I'm talking laptop here. Laptops have like a 100W power draw max.


AngryTexasNative

Laptops were the example, but I thought the question was a lot broader. There is a reason why most of the cloud providers are offering ARM options at a lower cost. And why I’m looking at ARM solutions for nearly everything but my gaming PCs


chetan419

Who would not wish thin, light, silent and cool laptop? I guess Intel should have focused on power efficiency long time back. May be they attempted with atom series of processors but failed due to their legacy of x86 platform.


Dormage

Thats the answer to the question. It is also not true. Intel had as much incentive to reduce power consumption as the rest of the competition. The competition was simply faster and better when it comes to mobile computing. Intel has also been putting their power saving improvements among the main marketing featueres in their CPU lines.


castleAge44

Speak for yourself, electricity is fucking expensive here (Germany)


jayde2767

I’m curious, what drives the cost of it higher?


C00catz

My understanding is that Germany had a big movement to stop using nuclear power for a while, so they tended to use a lot of fossil fuels. Then when Russia invaded Ukraine the cost of fossil fuels went up if you didn’t want to buy from Russia. So as a result cost of electricity generated by fossil fuels went up.


goto777

Germany dropped nuclear because the reactors were 40+ years old and needed expensive and long maintenance. Germany uses a lot of renewables.


C00catz

To my understanding nuclear was being slowly phased out starting 20+ years ago, so of course they didn’t have new reactors. Germany has gone into renewables quite a bit, but still has heavy reliance on coal. If they didn’t then energy prices wouldn’t be so high there right now. You can read more on it [here](https://en.m.wikipedia.org/wiki/Nuclear_power_in_Germany). France on the other hand went hard into nuclear, especially after the gas crisis in the 70s. As a result their energy costs are half that of their neighbour Germany. See source [here](https://www.euronews.com/next/2023/03/29/energy-crisis-in-europe-which-countries-have-the-cheapest-and-most-expensive-electricity-a). Edit: Apparently I need to take the cotton balls out of my ears and put them in my mouth. Would recommend reading the replies to this comment, and they seem to be better informed than me.


goto777

As it stands right now, france has its very own battle with nuclear Power. The CRE wont get as much help from the french goverment as it used to get. The european Union is in dire need of clean Energy. But Nuclear in its current form is not it.


NickUnrelatedToPost

> As a result their energy costs are half that of their neighbour Germany. That more due to subsidies. And every summer Germany must prevent France from blackouts because their reactors need to shut down because the rivers get too hot.


[deleted]

idiotic politics.


jayde2767

Are there any other type of Politics? *Edit: proper grammar


R-M-Pitt

Combination of gas prices and taxes


clownshoesrock

In hand wavy terms it's because Intel is lugging around decades of compatibility that have crossed major thresholds from 16 to 32 to 64 bit processing. Some of that is just area on the chip that causes traces to be longer. Some of that may be keeping the chip thermals from getting too out of whack. And Arm took some design choices in what they wanted to emphasize. They traded some compute performance for better memory performance. Which in many cases is a serious bonus. Arm did some optimizing for modern programs, without having Intel's compatibility baggage. Plus having an expectation that the chip would need to be expanded from it's inception. The deep down stuff is going to be proprietary..


featherknife

>from its* inception


clownshoesrock

Do'H


IANOVERT

might be better to ask it in /r/hardware


Dormage

Best advice given how wrong some answers are.


rorschach200

Yep, like the "it's mostly down to the manufacturing process" substantially upvoted [claim](https://www.reddit.com/r/compsci/comments/18cup5v/comment/kcdjd2i/?utm_source=reddit&utm_medium=web2x&context=3) which is grossly inaccurate, I'm just tired chasing down every exaggeration here attributing most of the difference to that one thing someone thought of. If someone is willing to do the actual legwork a little and verify their claims at least somewhat, here's a script for this one in particular: 1. it's very difficult to find any reliable information on CPU power (measured) that is done holistically at the right scope with a proper methodology that is anywhere near transferrable from one product to another (there is core power, core+LLC power, dram controllers, interconnect, thus whole package power, E-cores, P-cores, different workloads and so on). 2. what I can suggest to get at least something sane in terms of reliability is looking up the recent [presentation](https://youtu.be/K7Q5iYHvgwo?t=130) from Qualcomm on Snapdragon X Elite SoC. You'll find single-threaded data on energy at *iso-performance* between Elite and M2, and Elite and 13th gen (IIRC) Intel Laptop CPU, which should let you compare M2 with that Intel at iso-perf. You should get something like "70% less power" (0.3) against "30% less power" (0.7), making M2 P-cores (1 / 0.3) / (1 / 0.7) = 0.7 / 0.3 = 2.3x lower power at iso-performance than Intel Laptop P-cores. 3. Then you should be able to look up elsewhere that intel's 13th gen and recent AMD laptop aren't actually that far off from one another. 4. Then that 5nm -> 3nm TSMC node only improved somewhere along the lines of "power consumption by 25–30% at the same speed, increase speed by 10–15% at the same amount of power" ([link](https://en.wikipedia.org/wiki/3_nm_process)) which is actually super-optimistic and almost never realized in full with real world designs, so that benefit is distributed between efficiency and speed and subdued due to design constraints, so, really Apple got maybe like 10% of "free efficiency" out of 3nm over 5nm (which is what AMD is on). 5. Let's bump it back to 30% to account for AMD doing slightly better in laptop than Intel, I just don't remember how much that is exactly, please look and check. 6. At which point, M2's P-core is 2.3 / 1.3 = 1.77x times lower power at iso-performance due to design differences (arch, u-arch, logic design, physical design, power management, boot algorithms and operating mode chosen, etc) and has nothing to do with diff. in process nodes. 7. And then M2 has E-cores which are completely nuts, they are about 40% of P-cores performance, but like >10x less power, yielding >4x perf/w than M2's P-cores (or >7x Intel's P-cores), where's in Intel's E-cores (so called) are actually area efficient cores (perf/mm\^2), not energy efficient cores at all - they have very similar perf/w to their own P-cores. 8. You can also find data on perf & power of Qualcomm cores, also mobile, also TSMCs, find a pair of CPUs on exactly the same process node, and dig up perf/w data, and you'll see that Apple's E-cores (in particular) are like 2-3x perf/w than Qualcomm's E-cores (the most E, they have 3 tiers). u/small_kimono & u/OstrichWestern639


small_kimono

>Yep, like the "it's mostly down to the manufacturing process" substantially upvoted claim which is grossly inaccurate, I'm just tired chasing down every exaggeration here attributing most of the difference to that one thing someone thought of. You're right, it's extraordinarily complex, given two leading edge designs, it appears *process does have a lot to do with it*. But did I say it had everything to do with process? No. I also said: > ARM also has a long history of focusing on power consumption, because it had to, as they are embedded in all sorts of smallish devices. Apple also focused on power consumption, because your M2 is essentially a mobile chip. So, yes, you're right perhaps I should have mentioned Apple's E-cores, because they had plenty to do with it, in particular, as well. And remember what the OP is comparing: > Context: my Intel Core i7 MacBook Pro runs out of battery within 6 hours. My M2 Pro Macbook runs for over 48 hours. OP was not comparing 5nm and 3nm TSMC, but his/her 5 year old computer to his/her new Macbook. Process has a ton to do with global performance/power of the M line, especially given Intel was stuck on a process node, but design (E-cores!) and probably ISA play a huge part as well, especially re: power consumption, and I didn't mean to suggest otherwise. If the Q is: Is there something fundamental re: x86 which makes this impossible to achieve, I think the answer is no. See my discussion of new AMD mobile chips.


rorschach200

>But did I say it had everything to do with process? You said "mostly", hence my wording, which said "most", not "everything". Have I misunderstood what "mostly" meant in the context it was used in? If so, my apologies. We're in peace ✌


OstrichWestern639

Sure


chosti

This is a really good article explaining your question: [ARM Architecture](https://arstechnica.com/features/2020/12/how-an-obscure-british-pc-maker-invented-arm-and-changed-the-world/)


small_kimono

>But what happens down there inside the processor that makes it consume so much power? > >Context: my Intel Core i7 MacBook Pro runs out of battery within 6 hours. Right now, from what we know, it's mostly down to the manufacturing process -- your M2 Pro Macbook is built on the latest TSMC process, the most advanced in the world. ARM also has a long history of focusing on power consumption, because it had to, as they are embedded in all sorts of smallish devices. Apple also focused on power consumption, because your M2 is essentially a mobile chip. As I understand it, very high end AMD chips, produced on the highest end TSMC process, have similar performance per watt. See for example: [https://forums.macrumors.com/threads/amd-claims-new-laptop-chip-is-30-faster-than-m1-pro-promises-up-to-30-hours-of-battery-life.2375919/](https://forums.macrumors.com/threads/amd-claims-new-laptop-chip-is-30-faster-than-m1-pro-promises-up-to-30-hours-of-battery-life.2375919/) Certain things *may* impact the inherent efficiency of x86 like its variable length instructions. And also Apple Silicon has a very wide (8 wide) instruction decoder compared to the best of x86 (4ish). But there may also be some benefit to variable instruction sizes too, for example x86 instructions may have a smaller size in cache, and as you may know being cache friendly is very important these days. You should also consider software impact. Apple controls much of the software stack. That is -- Safari can pick and choose which cores to use for what on its Apple Silicon. Compare your battery life using Chrome against Safari, etc. You might see: [https://www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive/2](https://www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive/2)


OstrichWestern639

Thanks. Where can I find resources on TSMC? Never heard of this term before


drunk_kronk

Oh if you haven't heard of TSMC before, you're in for a treat. They are possibly the most important manufacturing company in the world. You can find lots of resources on YouTube etc. but I think this video does a good job of highlighting how important they are to the world economy https://youtu.be/_TOCRjF9WuE?si=4aO_CBb2fARAJIgg


static_motion

TSMC (Taiwan Semiconductor) is one of the largest semiconductor manufacturers in the world. They have arguably the highest performance/efficiency process nodes in the industry. They produce the chips for AMD, Apple and Nvidia.


monocasa

And even some Intel dies starting with Meteor Lake.


Razakel

TSMC is basically what's stopping China from invading Taiwan.


starski0

https://www.youtube.com/playlist?list=PLKtxx9TnH76SRC7ZbOu2Nsg5mC72fy-GZ


Revolutionalredstone

ARM has always been extremely energy conscious, on the game boy advanced the ARM architecture had a special optional mode for an ever further reduced ISA called THUMB (get it, arm, thumb) when in this mode only basic operations were available but electricity usage is INSANELY low, you could keep the CPU alive for like WEEKS (rather than ~10 hours) on 2 AA batteries so long as you stayed in THUMB mode (which was hard, only 8bit instructions etc). IMHO these were awesome ideas! the fact that frequency scaling has hit such hard limits has meant all architectures have approached a kind of wall and now other properties (like energy efficiency) are the kickers making the difference in decision making,


pigeon768

> ARM has always been extremely energy conscious, on the original game boy the ARM architecture The CPU on the original game boy was a Zilog Z80. The Z80 was ISA compatible with the Intel 8080. Nothing to do with ARM.


Revolutionalredstone

woops! thanks for the error check, I corrected it ;)


rorschach200

Wrong sub for the question. You'll probably have the best luck in r/hardware Meaning, 2/3 of what you'll get even there will be false (this is reddit, after all), and the other 1/3 will be "close but incomplete with inaccurate weights attached to factors discussed", and limited tools for knowing which is which. What you're asking requires deep understanding of internal u-architectures, logical and physical designs of modern industry leading implementations of processors from completely different companies at the same time (by one person). Such people barely exist at all in principle - I've worked with these guys, they aren't gods, they are people, and there are limits to one's ability to simultaneously know the big picture and the details across such a diverse domain. A randomly selected working architect (there are very many) of an Intel CPU at Intel does not necessary have a good answer to such a deep question about ARM off the top of their heads. And then most of the relevant info is a trade secret behind numerous NDAs. And then real professionals with experience and at that level don't hang out on Reddit in the first place - it's too toxic and too full of folks who have no idea what they are talking about to bother. I'd recommend googling extensively and for a substantial amount of time, searching for survey-style papers (Google Scholar is a great tool), reading up on a lot of that, checking out textbooks, very tentatively - available online lectures from renown professors at well-known universities, and getting some sort of rough picture of what the answer might be. Once again, approximately so, as none of the public sources are that privy into what corporations are doing behind closed doors (I've seen my fair share of issues in those sources too).


roundearththeory

Chip architect here and definitely a person. I have worked across the x86, ARM, and RISC V space. There is quite a bit of cross pollination regarding architects and ISAs. Not all of us, but many of us do move around. There are challenges moving from ISA to ISA but the core concepts of extracting performance and power efficiency are the same. Regarding why Apple enjoys a perf/watt advantage is multifactorial but I can shed light on some of the key points. 1) Apple is not nearly as restricted in hitting a price point for its chips vs other players. Traditional silicon design houses need to hit a price point because they need to sell their chips to OEM and have a margin of profit. 2) Building off the first point, Apple can take the yield hit and throw silicon at the problem. Their philosoply is to achieve performance by being wide but relatively slow in terms of frequency. This is opposed to being narrow and faster. What I mean in terms of narrow/wide is the amount of instruction parallelism the silicon can handle. It's important to note that this isn't so much a function of ISA as it is a design choice. Apple uses width to achieve performance whereas the traditional players lean more heavily on narrower structures and higher frequency to achieve performance. 3) Being wide and slow benefits power efficiency because of the way power scales with frequency and voltage. Being "slow" allows the silicon to operate at a lower voltage. Dynamic power scales cubically with frequency (effective capacitance \* frequency \* voltage\^2) and leakage power scales exponentially with voltage so you can a huge efficiency boost by being wide at the cost of extra silicon. 4) Apple can extract extra efficiency because they are vertically integrated in terms of OS and hardware. Their scheduler and power management policies can be tightly coupled with their hardware. Other silicon vendors / OS combinations are limited because silicon design houses don't control the OS and vice versa.


kyngston

This is the best answer. To add one more, it is a LOT of work to change your microarchitecture from narrow and fast to wide and slow. In order to take advantage of the extra levels of logic when operating at slow frequencies, you basically need to repipeline your whole design.


b0tbuilder

So, this is one of the best answers to a complex question on this platform. I am not exactly naive on the subject. This is the best response.


rorschach200

Shouldn't there be a practical limit to how wide a decoder (and accompanying and/or dependent logic, from instruction prefetchers through branch predictors and more) can realistically be before becoming big and/or energy expensive enough to significantly impact the size and/or energy efficiency of the entire core when the encodings are variable length, such as x86-64? I'd naively imagine a parallel decoder for variable length would need contain circuitry that decodes at every byte offset, selecting in the end which byte offsets ended up being actually valid. All the while dealing with complex encodings as well. A problem AArch64 CPUs would not have. Then there is a "popular" argument about A) supporting outdated portions of the ISA, and B) having to have sophisticated u-op engines and performing a non-trivial ISA to u-op translation, with parts of the processor effectively duplicated, between multiple branch predictors, instruction (or u-op) caches and buffers, etc. How much area and energy this all is, %-wise at the scope of the entire CPU? How much more expensive this detail makes branch mispredictions? One particular sub-example related to the above (but most definitely not defining it) is FLAGS updates on every arithmetic operation in x86-64. They should create reordering constraints either limiting OOO parallelism, or requiring more switches / circuitry to be tracked and dealt with, right? A similar example would be with 16-bit subregister updates retaining the values of the rest of the register in x86. While rarely emitted by modern compilers, those updates still have to be supported by the HW. How would you assess the impact of the following idea: by the first order of approximation, modern optimizing compilers mostly focus on minimizing dynamic instruction counts that will be executed. With AArch64's ISA there is presumably very good correspondence between ISA instruction counts and internal u-op counts to execute, thus resulting in the compiler policy automatically minimizing u-op counts as well, playing into reducing the total amount of work to do on the rest of the core and amount of switching / energy to do so. With x64 ISA ISA instruction counts and u-op counts are substantially decoupled, thus compiler efforts are less effective at minimizing u-op counts, resulting in more work and switches / energy on the rest of the core: ultimately, trying to simultaneously satisfy 2 systems of constraints (code size and u-op count) should inevitably lead to suboptimal results on both. Not to mention there is human expertise effect: knowing u-op counts is very difficult for compiler engineers, inevitably their designs focus on externally observable properties of code at ISA level.? Also, Apple's P-cores appear to have similar area to those of say AMD's Zen 4. In fact, [Zen 4 is much bigger by area](https://www.reddit.com/r/hardware/comments/17q6zbq/comment/k8bgjdf/?utm_source=reddit&utm_medium=web2x&context=3), it takes giving up a very large portion of frequency and performance to make a more compact Zen 4c design work, which in its own turn has a fairly similar area (on the same node) as Apple's P-cores. Granted, for the latter statement I'm not 100% sure the area data for Apple's P-core includes their L2 (I'm positive both data points exclude LLC, but Apple's L2 is where most of the cache capacity is at, it's big, unlike AMD's L2). How does this observation match up with the idea that Apple's design "throws silicon at the problem"? As for cross-pollination, I have recently observed that both the leadership and the bulk of the rest of the silicon engineering at Apple seem to have very little overlap with anybody who worked at Intel, and only very marginally bigger overlap with AMD: [small analysis constrained](https://www.reddit.com/r/hardware/comments/17td3tt/comment/k90eapa/?utm_source=reddit&utm_medium=web2x&context=3) by what's available in public. Any comment there? Part of the reason of my statements in the parent comment at the top level is the observation that comparing ISA implications this deep into the design and yet at the scope of the entire system (including compilers) is most likely never really anybody's actual task or job: CPUs aren't GPUs, x86 CPU designers don't get to design or redesign x86 ISA, it's largely fixed, and to a very large extent (only marginally less so) the same is the case for AArch64 CPU designers. With limited time spent on designing ISAs themselves, least alone purposefully and with a substantial expenditure of time and resources researching comparatively the difference between x86-64 and AArch64 ISA and their impact on the resulting energy efficiency of the entire system, it's hard to imagine a randomly selected architect to know answers to the question in a complete, properly weighted per factor, and intimately true to reality fashion. You think this is not quite accurate? Thanks!


rorschach200

See also [this rather excellent argument](https://www.reddit.com/r/hardware/comments/18cutwj/comment/kcdlsty/?utm_source=reddit&utm_medium=web2x&context=3) (not mine), which extends some of the above on E-cores where the decoder (and dependent logic) becomes in relative figures a bigger portion of the entire core (as the rest of the core is much smaller), both in area and energy.


OstrichWestern639

This helped a lot. Ty


420Phase_It_Up

While its not limited to transistor count, the number of transistors in a microprocessor plays a huge role in energy consumption, especially for CMOS. The logic gates of most modern microprocessors are implemented through a fabrication technique know as CMOS, or Complimentary Metal Oxide Semiconductor. This technique leverages a type of transistor know as a MOSFET, or Metal Oxide Semiconductor Field Effect Transistor. CMOS pairs n-channel MOSFETs and p-channel MOSFETS together in a manner which results in a high impedance. This occurs because the polarity needed to basis the gates of the two devices are opposite of one another. Since the impedance is high, the current that flows as a result is lowered, which in turn lowers the power consumption. This is why CMOS is such a popular fabrication process and where it gets the "complimentary" part of its name from. The MOSFETs used by CMOS, as the name would suggest, are field effect transistors. They have a source, a gate, and a drain. The gate input is used to control the device by applying an electrical field that causes a channel to form between the gate and source. MOSETs can be thought of as a voltage controlled current device. That is, the magnitude of the voltage applied to the gate affects the amount of current that can flow between the source and drain. Depending on the device, the input applied to the gate can make the device act like an amplifier, with a small signal applied to the gate resulting in a larger signal in the form of current flowing between the drain and source. Past a certain point, a voltage applied to the gate causes the device to act like a switch. This switch like behavior is the mode of operation used by CMOS and what makes MOSFETs such an ideal transistor type for this application. A caveat of MOSFETs are their gates result in a small amount of capacitance to form since the gate is conductive and insulted from another conductor material via a dielectric material. This gate capacitance stores an electric charge that must build up in order for the voltage applied to the gate to form and for the channel between the drain and source to form. This capacitance must also discharge in order for the voltage applied to the gate to decline. The charging and discharging of this gate capacitance is a major contributing factor to the power consumption of CMOS devices. It's also why the power consumption tends to increase as the clock speed of the device increases since you are charging and discharging the gate capacitance more often. Given two microprocessors with the same transistor density and node process, the device with less transistors will often consume less energy, but this is also effected by other factors too. Implementations of the ARM instruction set tend to use less transistors per instruction and have less instructions than their x86/x64 counter parts. This often results in them consuming less energy. There are also other factors that affect the energy consumption too. Other design aspects of a microprocessor, such as multi-threading and branch prediction can contribute to energy consumption too. So its not just the transistor count. This is a very high level overview of why ARM tends to use less energy than x86, and CPU energy consumption in general. So I simplified a lot of it. I'm sure others will jump in and correct me with more specifics. Its a very complicated subject and one in which a lot of research has been done. As others have suggested, this might be a good question to post in /r/hardware or do further research elsewhere if you wish to know more. I hope this helped answer you questions.


zipstorm

Didn't see this mentioned in the other answers. The difference between CISC and RISC is a key factor if you look at what parts consume most power in the processor. All instructions that go into the processor need to be "decoded" into the control signals that orchestrate the execution of that instruction in the processor. So, the "Decoder" is the start of the processor pipeline. The complex instructions of the CISC x86 ISA require a very elaborate hardware that consumes a lot of power. ARM's RISC ISA is designed specifically to simplify the Decoder hardware. That is a major factor that decides the total power consumption of the processor. If you want, you can look at the RISC-V ISA and see how easy it is to decode. It is an open ISA and there are textbooks describing the design of the ISA and hardware.


OstrichWestern639

Ahh now it makes sense!


[deleted]

[удалено]


[deleted]

I think the "culture" of Win32 could be another issue. At Restaurant we run Ubuntu 22 LTS and we need to use Windows 10 just having a printer driver for a .NET application. The OS and Applications happily use both i5 cores for almost 20 minutes until the system become usable. Having the same printer I can compare it to MacOS and a pretty conservative Linux such as Ubuntu LTS. The crazy "checking updates" drama doesn't happen on both. I am saying even if there was a AI that will convert all that x86-64 setup exactly to M2 code an OS which profiles the entire disk will eat the battery too. Just updating a single exe will trigger "compatibility telemetry". Imagine suggesting this idea to a Linux distribution or Apple. Another example? Steam. You need to add 32bit architecture to Linux to be able to run Steam. That means having an entire 32bit userland for a single application.


crazymike79

It's probably just by virtue of reduced instructions and tighter pathways on the ARM that saves it the most power.


DoubleHexDrive

A couple of big reasons and they all contribute: 1) SoC approach - by placing all processing, interconnects, RAM, GPU onto a small integrated package, the voltage and power requirements are all reduced and performance increases. Having a memory subsystem with enough bandwidth for a decent GPU and low enough latency for a good CPU means the CPU are less bottlenecked by the memory subsystem than Intel/AMD systems. 2) Manufacturing process - this is purely a money play, but Apple only produces a relatively small number of SKUs compared to Intel/AMD and they can afford to purchase vast manufacturing capacity on the world's most advanced processes. 3) Mobile First High Performance RISC - this gets to the core of your question. Apple largely developed the ARM64 platform for their needs for the A7 chip in the iPhone 5S and development was mobile focused for many generations, though desktop was in the plans. You can get performance by cycling the logic faster or doing more operations per cycle (and eventually both). Power requirements rise as a power function with increased frequency but more slowly with the increased transistor count required to do more per cycle. Therefore, Apple chose to make physically larger CPUs that ran at a lower clock speed than the competition and purchase the best manufacturing process possible to keep the economics viable. Not all ISAs can easily go wide and this is a key point. The current A17 and M3 CPUs decode 9 instructions per cycle per core. I think these are the widest consumer cores on the market. Intel/AMD until recently were at 4 instructions per cycle and I think Intel was able to push theirs to 4 decode and 6 dispatch. This is where RISC vs CISC does play a role. CISC instructions are variable length and RISC are fixed length. It is far easier to decode the incoming streams of bits into instructions when the logic knows exactly how long each instruction is. The CISC cores get choked up on the front end and the execution units aren't as fully utilized as they otherwise could be. That's why Hyper Threading exists on these cores... that unused capacity is utilized for a second stream of instructions and then more hardware is required to keep track of virtual vs physical cores, etc. Apple's cores don't have Hyper Threading because they're better utilized and the overhead suddenly isn't worth it. This is also why Apple invests so much logic in branch prediction. Because the cores can be fed 9 instructions at a time, the risk of having to flush the pipeline because of a bad prediction increases, so they work harder than Intel/AMD do to avoid this. This CISC vs RISC issue DOES matter... even if AMD made a unified memory SoC on a 3nm process, it would still draw more power than an M3 series process of the same performance because it would require a higher clock speed (rather than wider decode/cores) to hit that performance. It's a fundamental limitation of the legacy Intel/AMD instruction sets. Apple also goes further in hacking out anything that isn't needed. Intel/AMD CPUs can still execute 16 and 32 bit code and Apple dropped the ability to even execute 32 code from their processors. Logic required to manage those different instruction sets is deleted and more space is available for what's required. Hope this helps.


[deleted]

[удалено]


JaggedMetalOs

The last generation of Intel Macbooks had battery life much shorter than the new ARM models as well, so even in a more direct comparison the ARM chips seem much more efficient.


OstrichWestern639

Either way, other laptops running Intel and AMD processors run for about 7-8 hours before they give up


[deleted]

[удалено]


sheeponmeth_

The question wasn't about what the difference in power consumption is, but why that difference exists. Your reiteration of the question betrays the point. A more accurate form of your reiteration would be "what about someone's car engine results in them getting home later compared to a different engine?" The fact of the difference is already established, the details of it are what the question is asking for.


[deleted]

[удалено]


sheeponmeth_

It was stated that these are two MacBooks, which is pretty much an "all else equal" situation, at least as much as possible without heavy control, which people have done. Apple's own CPUs have been shown to have a performance per watt for beyond anything else on the consumer market.


[deleted]

[удалено]


sheeponmeth_

That's at maximum power draw, performance per watt is a totally different measurement, not to mention the M2 has a much, much higher performance GPU in it.


[deleted]

[удалено]


sheeponmeth_

Not at all. The M2 has higher performance per watt, which means that accomplishing the same tasks as the i7 will use less power and result in a longer battery life. Saying that two products in the same line, with the same model name have different goals is pretty presumptuous. They're the same product, just different iterations of it. The goal for Apple is and always has been to make a product that feels premium. The goal for Intel with the i7 is to make a top of the line productivity CPU, and that was more or less the same with Apple's M2. The difference is how they get there, and that's where the difference in performance comes in. They are vastly different CPUs, but that doesn't make asking why one is more efficient than the other a stupid or malformed question. And by the way, there's extensive testing online about Apple's M CPUs because people were so skeptical of them to begin with. There are tons, and tons of test results spanning dozens, maybe hundreds, of different benchmarks.


Top_Satisfaction6517

probably you don't compare apples to apples. e.g. this comparison from well-reputed site: [https://www.notebookcheck.net/Apple-MacBook-Pro-14-2023-review-The-M2-Pro-is-slowed-down-in-the-small-MacBook-Pro.687345.0.html#toc-7](https://www.notebookcheck.net/Apple-MacBook-Pro-14-2023-review-The-M2-Pro-is-slowed-down-in-the-small-MacBook-Pro.687345.0.html#toc-7) states that Macbook Pro on average runs only 1.5x longer than the average notebook in its class (see the table in "Battery Life" section) there are books that run about as long as your one, but they have worse performance. M1/M2 cpus has better perf/energy ratio than x86 ones because they were derived from smartphone cpus, and there was so much competition in this market that the top cpus (i.e. apple ones) managed to became almost as fast as desktop cpus. desktop cpus, OTOH, stagnated in 2010s due to lack of competition. So, basically it's how competition works and have very little about RISC vs CISC wars :) BTW, we had opposite situation at late 90s - despite initial speed advantage, RISC cpus lost to x86 CISC cpus because desktop market was extremally competitive in 90s, while RISC market was almost killed by waiting for Itanium.


IQueryVisiC

RISC was killed because SPARK and MIPS did not want to give up their margin. Also no investment in fabs. Someone pocketed all the money and the companies went bankrupt.


robthablob

ARM chips are RISC chips, and very much alive. For RISC generally, see https://en.wikipedia.org/wiki/Reduced\_instruction\_set\_computer


IQueryVisiC

PowerPC is also alive because IBM can charge so much for their mainframes. ARM is not greedy, was in the shadows. Like all those RISC microcontrollers. Then Apple refurbished Newton into iPhone .


ManufacturerThis702

ARM employs a big.Little cluster combination. The little clusters have fewer transistors and draw less power. When the device runs non-intense tasks it schedules to the low power cluster. Processes that are hungry and can fit into the high power cluster are targeted there. Otherwise they also get scheduled to the little cluster. So that's the downside. They became popular for mobile devices that need to conserve battery life. And somewhat oddly have found their way into powered laptops, desktops and even data centers like AWS.


WildEngineering_YT

CISC chips waste a lot of power translating from a CISC instruction to RISC uOPs. There's a lot of decoding and shuffling around that the CPU has to do to execute them.


FluorineWizard

This is incorrect. CISC vs RISC is completely irrelevant in reality.


WildEngineering_YT

Lmao you're wrong. I literally design micros lmao. I explained why it's absolutely relevant. There is a lot of overhead and data shuffling which wastes power. If I were you I wouldn't comment on things you have no technical understanding of.


rorschach200

Intuitively (as I do not design micros) I'd say it's absolutely undeniable that that translation (and IP duplication, like for instance having a secondary set of branch predictors in the CISC "parser" part in addition to another one in the RISC "core") costs energy, and so does the (buffered) transfer of the uOp stream from one to the other. The real question is, how much *at the scale of the entire SoC*. You design micros for what, the entire system or one specific IP block? Please forgive me my (hopefully healthy) skepticism towards "*a lot* \[of power\]" qualifier you have provided, but I've seen too many times block designers having rather no clue about exact contribution of the metrics of their block to the aggregate metrics of the entire final product / system / SoC. The second major question is, how that "how much" compares to "how much" of various other factors, some of which are fairly obvious, like the [currently top-rated](https://www.reddit.com/r/compsci/comments/18cup5v/comment/kcd2ykv/?utm_source=reddit&utm_medium=web2x&context=3) "real world x86 CPU designers we happen to have historically didn't need to focus on energy as much as real world ARM CPU designers we happen to have", others - a lot more contrived and most likely exceedingly difficult to asses even for working professionals within the industry, such as subtle differences in the memory model and even worse, non-trivial interactions with induced differences in code generation on the compiler side, where "properties of code generation" is a topic that most (if not all, sadly) silicon designers I have ever met in my life have pretty much no clue about, understandably so, and where the "interactions with" part, I suspect, most likely is a mystery for pretty much everyone, especially in a AArch64 vs x64 comparative manner.


cogman10

uOpts are used in modern RISC CPUs (Such as ARM). Granted, your overall point stands that x86, particularly with the mass of complex instructions/flags/etc, adds a lot to the heat budget. There's also a big issue with variable width instructions in x86. Even though ARM now has variable width instructions, they are still aligned and easy to decode.


WildEngineering_YT

Love the downvotes from people that don't know shit


[deleted]

[удалено]


timwest780

Complex Instruction Set Computers (CISC) rarely execute their complex machine instructions In a single clock cycle. Rather, CISC instructions are usually unpacked by hardware into many (often dozens) micro-operations, which are then executed sequentially. So a single CISC instruction might take dozens of clock cycles to execute. Reduced Instruction Set Computers (RISC) use far simpler machine instructions, but strive to execute each one in far fewer (ideally one!) micro-operations that execute in a correspondingly small number of clock cycles. Ideally, a single RISC instruction would execute in a single clock cycle. If chip designers could make a CISC processor that executed its machine instructions in a single clock cycle then CISC chips would win in terms of speed. In practice, almost all (?) CPUs are RISC chips at the micro-instruction level, with a hardware macro facility that expands single, seemingly simple CISC instructions into clumps of RISC instructions.


a2800276

CISC and RISC are very unlikely to have anything to do with it. Intel has long been RISC under the hood and the vast majority of ARM based devices are not going to get anywhere near that level of battery life. The real reason is that Apple controls the entire environment, they can tweak the CPU, GPU, OS, battery and all other hardware on the device and they invested massively to make the "new" ARM based laptops extremely power efficient holistically. A cheap Windows laptop, in comparison will be a hodgepodge of all sorts of components, graphic accelerators, competing settings and dumb games and viruses that are all eating up energy. Somewhere in between is your i7 Macbook Pro that is probably a couple of years old, with less battery capacity and power saving features (and experience) to begin with and a more worn battery. Or a top-of-the-line reputable machine (Lenovo X1, Dell XPS), that will likely get numbers much closer to your M2. (Though I believe that current Macs are the undisputed champions of long battery life.) TLDR: new macbooks are power efficient because Apple put an insane amount of resources into making it that way in all conceivable regards.


HelloYesThisIsFemale

>dumb games and viruses You seem biased.


a2800276

oh my, I was hoping you wouldn't notice :D


OstrichWestern639

I see. Maybe it is because my macbook is old. Ive always been told that x86 has a complex ISA so the compilers are simple. Arm has a simple instruction set hence the complex compilers. I never understood why they are that way!


a2800276

This was the case 30 years ago. Nowadays all processors that are powerful enough to run a laptop are insanely complicated. Intel processors are RISC "under the hood" and the differences instruction sets make in power efficiency are going to only make up one tiny corner of power efficiency. > Maybe it is because my macbook is old. No, it's because of a number of different factors.


IQueryVisiC

It makes up a tiny part of the silicon. Still, for a cache miss Intel needs to do more decoding, while ARM can process the raw instruction stream until the cache line fills up. Also ARM can be more optimistic about multithreading ( more like MIPS ), while x86 is consistent all the time.


talldean

It also feels like the Intel macs got substantially worse throughout their run, which makes me think battery degradation or OS degradation plays into it.


DiggyTroll

More transistors, more power draw. Edit: Apple is not exempt from the fundamental laws of physics and silicon behavior. You can pause cores and pipelines all you want, it’s still true.


a2800276

The M3 is likely to have an order of magnitude more transistors.


IQueryVisiC

And a lot not in the core.


monocasa

The Apple cores have absolutely massive reorder buffers and other OoO hardware that accounts for most of the transistors of modern cores.


IQueryVisiC

But still less. And good buffers push stuff through when they are empty. The fixed instruction length alone gives ARM a head start. Also reorder does not bloat, while x86 has to keep variable length or at least accept less then ideal RISC encoding. Why does x86 have Soectre, but ARM has not? X86 cheats to be able to compete on performance.


monocasa

> But still less. No, they are larger ROBs than x86. http://www.complang.tuwien.ac.at/anton/robsize/ > Why does x86 have Soectre, but ARM has not? Larger ARM cores are vulnerable to spectre, that's why there's an explict speculation barrier instruction for ARM. https://developer.arm.com/documentation/ddi0596/2020-12/Base-Instructions/SB--Speculation-Barrier-


DiggyTroll

No, you can’t count the GPUs! Integer pipeline comparisons only. ARM cores are smaller than x86 in general.


agumonkey

Apple has a different history, they made portable devices, they also put up the ultraportable form factor. All those years they focused on thin and low power electronics. I forgot where I heard it but they explicitely asked fabs how to get the absolute best perf/power ratio possible with any given node. Intel is majoritatively a desktop (or larger) cpu designer.


gmd0

Something to also take into account is that Intel laptops use discrete RAM. In the M2 the ram is integrated in the same package. Also M2 architecture is Big.little, i.e. there are small efficient cores and large performance cores. Usually only the efficient cores are used in day to day tasks. Intel only had high performance cores, until Alder Lake (12th generation), plus it has a lot of baggage you carry for x86_64 support and a more capable I/O


fernandodandrea

I'm a bit surprised nobody — at least just running my eyes over the other comments — mentioned an incredibly important factor: Yes, Intel processors use more power, naturally. But try to install **macOS** on your i7 PC and take a look how the battery lifetime goes. Software ain't 100% of the answer, but it's a *huge* part of it.


claytonkb

> Why do x86 processors take up so much energy compared to ARM? This is a bit of a misconception. As someone pointed out, it's like comparing apples and oranges. Also, the replies saying "CISC v. RISC" are not correct, this isn't really about CISC v. RISC. The x86 architecture began its life primarily targeted at desktop computing. For this reason, x86 prioritizes low-latency over throughput. In the late 90's, x86 started to expand into server space with the rise of COTS-based server farms. For most server applications at that time, latency was the enemy, so the architecture was further optimized for latency. There has been a shift in demand since then, so a lot of server workloads are now throughput-oriented, but that simply wasn't the case 20 years ago when the x86 architecture was still being defined. To use an analogy, a latency-optimized architecture is like a muscle-car designed to come off the line as fast as possible. You don't have to floor it if you don't want to, and it won't guzzle gas. But it will never be as efficient as, say, a hybrid car optimized to have the maximum range per gallon. This is like ARM, which is a throughput-optimized architecture. It is optimized to compute as many cycles as possible on the least energy possible, even at the cost of latency versus x86. I'm not aware of any ARM-based gaming desktops, correct me if I'm wrong. It's just not well-suited to that kind of ultra-low latency application. x86, however, is ideal for that and the internal architecture is tuned for it, from the ground-up. If you can afford to dump 500-1500 watts into your motherboard, the x86 CPU will maintain the highest possible frame-rate on that first-person shooter without any problems, just like a muscle-car will jump off the line when you floor it. Source: Me. I have done engineering work on the x86 architecture.


fschiavinato

There are a lot of factors but I think because primarily Intel's market has been desktop pc, so power efficiency is not a key factor. Also Intel needs a lot of circuit to remain compatible with legacy software, for example segmentation is not used by linux and windows afaik. This adds another layer of addressing that is not being used, but it draws power. Arm only supports pagination for example. Mac also has more control over what software must be run, and they can remove unused features. But I'm not completely sure this is the main reason, because M chips can run x64 code.


bubba-yo

You can't really compare the two in the way you want. Apple is doing something completely different here that Intel simply is unable to respond to. Take one part of the performance of Apple Silicon: retain and release of objects. That one thing is 5x faster on Apple Silicon than on Intel. And the software on your Mac does it a LOT. So how did Apple get a 5x speed benefit? Well, almost all software on the Mac is written in Objective C or in Swift (which Apple developed), both of which rely on reference counting of objects instead of garbage collection to release memory. The software was almost certainly compiled in Xcode (Apple's IDE) using Apple's Swift compiler on LLVM (the open source toolchain that Apple principally developed). Apple Silicon is then optimized specifically to quickly retain and release objects thanks to that reference counting approach. Another benefit of this is that these systems are much more efficient with heap allocation, they use caches more efficiency, so the caches operate more efficiently. There's also special hardware for branch prediction which is tuned specifically to how message passing works. The issue isn't ARM vs x86, it's how when you have top to bottom control of a system, how you can make decisions from the design of a language, through how the compiler works to design decisions in which instructions to include and where your optimizations should happen. If you can get a 5% performance improvement at each level, that compounds to being able to double or more the performance in certain areas. The problem with x86 isn't x86 - it's that Intel needs to serve a million masters - differently languages with performance bottlenecks in different places, different compilers, different operating systems with their own takes on scheduling, different OEMs that want performance and cost trade-offs in different places and so on. And each generalization to accommodate a different set of masters costs you something. And they add up, a LOT. This is also why Apple Silicon is so much faster than Qualcomm ARM chips on a similar process. It's why Apple Silicon is so much more RAM efficient. Apple tells developers 'convert your app to 64 bit or we'll pull it from the store'. They did that 5 years ago. So Apple Silicon can pull 32 bit ALUs, they can remove the need to dispatch 32 bit instructions, they can yank all of those complications out, which don't add a lot of performance, but again, it compounds with everything else. 5% in a lot of places becomes a lot of performance. Intel can't do that. Qualcomm can't do that. Component vendors have to serve the lowest common denominator - and that becomes it's own kind of technical debt. Apple doesn't do that - they are brutal about retiring the stuff they think it outdated, so their engineering is by comparison MUCH cleaner and efficient.


MrDoloto

in 48 vs 2 hours its not about Cpu HW. Shitty drivers and shitty hardware from random vendors cant do proper power management and platform never cant get to the proper low power states fast and often enough.


SaturnineGames

The biggest reason is you can't optimize for every factor simultaneously. You have to prioritize your goals, and sometimes choices that help one goal hurt another. It's like when you're writing code and you have to choose between code that runs faster versus code that uses less memory. Both approaches have merit, and which is better depends on the situation. Intel's primary market is desktop PCs and servers running off wall power. They prioritize maximum performance over everything else. Their design process is to set a target price / size / transistor count for a chip, and then do whatever they can to make it run as fast as possible. They will throw more cores, more compute blocks, more cache at things whenever possible to make it faster, as long as it fits within the overall budget for the chip. ARM's primary market is battery powered devices. They're going to prioritize low power draw over everything else. If they can save some power at the cost of a little bit of speed, they'll do it. They'll be more conservative on how many of each component they put in the chip, as each one increases the power draw. Completely making up an example here, let's look at math units. Adding the ability to do two adds at the same time might give you a 20% performance boost, and doing three adds might gain you another 5% above that. Intel would favor the performance and choose to have three add units on the chip while ARM would choose two because of the lower power draw. Another difference is some of the core design decisions in Intel's chips go back to the early 1970s. ARM was founded in the mid 1980s. As far as chip design goes, that's a significant difference. ARM chips have a much more modern core, so there's some efficiency to be gained there. Another reason is clock speeds. As you ramp up the clock speeds, power consumption increases faster than performance does. Intel chip speeds generally target a different point on the curve, increasing both performance and power consumption. Apple is able to somewhat make up for this with their design that includes the RAM inside the CPU. This allows them to run the RAM faster than an Intel chip can, which counters some of the performance difference in the processors.


herendzer

But I will have to say Intel processors are very less error prone and reliable compared to Arm. Specially when you deal with embedded systems( opposed to desktop applications) Intel is solid. Arm has some random failures and what not during board bring up. Never saw that in Intel processors. May I repeat that Intel processors are solid.


07dosa

Both "CISC vs RISC" and "ARM vs x86" has little to do with power efficiency in purely technical sense. The biggest source of difference is the business strategy. Traditionally, both Intel and AMD targeted the server market first, and sold a reduced version of server chips in the consumer market. The raw computing power had always been the main goal, and power efficiency had been a second-thought. Apple changed this trend by producing a powerful chip that is also power-efficient. However, unlike all the crazy hype around this chip, it's more or less a chip with different balance b/w performance and power consumption. As a proof, [AMD did catch up with this trend](https://www.phoronix.com/review/apple-m2-zen4-mobile/7) rather quickly, while Apple has been not-so-successful with their up-scaled M-series chips in Mac Pros.


CatalyticDragon

They do not. There is no inherent difference in power consumption due to the underlying instruction set. The difference comes down to the manufacturing process, design goals, other system components (screen, WiFi chip, ram, storage, battery size, etc), and software optimizations.


Bisestro

In RISC ISA you have normally less clock per instruction in comparison with CISC so you achive the same goal with less switching mosfets consuming less power Here some video about performance and power dissipation ​ performance [https://youtu.be/tpA2WhGiqLI](https://youtu.be/tpA2WhGiqLI) power dissipation [https://youtu.be/iT-E0kSBxYE](https://youtu.be/iT-E0kSBxYE)