apple Apple's demise

By Paul Hsieh

Who says Apple is Doomed? I DO!

Macintrash


In the News

Older Apple News

Cupertino -- we have a problem

05/06/06 Apple is now using good CPUs to run their machines, and apparently in the latest Mac OS X they are now accelerating 2D graphics more fully (bringing them on par to where PC were 12 years ago.) So everything ought to be hunky dory right? Well not quite. It appears that the system developers at Apple are so isolated from reality that they have completely missed important OS architecture trends.

Jasjeet Sekhon has run some benchmarks of the statistical programming language R, and things do not look very good for the Mac. Click on the link below for more information:

Declaring victory in defeat

01/17/06 So at this past MacWorld Expo, Jobs unveils the first x86 based Mac product that basically should be shipping right now. Apparently this is early relative to their publically announced roadmap or something (the Mac rumors retards was thinking that Apple was going to become a Television vendor -- I'm not even going to guess at what their reasoning is for that.) Jobs also showed some ipodding software, some simple webpage authoring app etc. The point is that he ran the demo on the x86 Mac. Leo Laporte from the "This Week in Tech" podcast made the important point that all the applications running on the x86 based Mac seemed to run ridiculously faster on the x86 than even his dual PowerPC G5 tower. Now here's the thing -- on Windows its extermely hard to notice CPU upgrade performance from running mere desktop applications (you usually can only notice when running video games, compressing video, or similar kinds of applications). The reason is that the CPU has basically been delivering "essentially infinite" performance relative to desktop applications needs on x86 for a long long time now. But for a Mac person this is probably the first time they have ever seen this kind of performance or anything like it. The deluded Macaholics don't have the functioning neurons necessary to fill in the rest of the reasoning themselves so let me just fill them in: The PowerPC is, and has always been pure silicon diarrhea. You were being LIED TO repeatedly and over a long period of time. They were very very bad CPUs. Whatever. But like the typical battered wife with no self esteem, the Mac faithful will go back to Jobs, and forgive his lies, and just scoop up whatever he says from now on.

Continuing with the "The Week In Tech" podcast (TWiT#38) which discussed the MacWorld Expo, John C. Dvorak put out the following prediction: "Within 2 years, Apple will switch to AMD". I'll take even money on the other side of that bet. The AMD processors are currently objectively faster than Intel's -- so that's the source of his reasoning. But what he's missing is that that's precisely why Apple won't do it. Let me explain. Intel is in desperate trouble with their CPUs. AMD is basically faster, cheaper, and at least as reliable as Intel across the board. In fact Intel has been losing a noticable amount of marketshare to AMD as a result.

So one way for Intel to deal with this is to seek to expand their market. But what would be the point if they just end up ceeding a large portion of that share to AMD? Apple did not just buy a CPU from Intel. They bought a motherboard design, chipset and (drum roll please) a compiler. Apple is concerned about making a complete system and probably want a more consistent road map than IBM gave them. Intel is looking for another exclusive partner (besides Dell) to fend off AMD. While the details of the deal are obviously a closed-door affair, its fairly obvious to me that Intel and Apple probably worked out an exclusive deal.

Checkmate!

06/11/05 That's it -- Apple's given up on the PowerPC. And for those of you lamenting its inevitable passing (if we ignore the Xbox 360) let me assure you -- it was never a very good processor to begin with.

The theories about why that are being bandied about are amusing to say the least, but I think for once in his life, Steve Jobs chose not to lie. Or at least not to disguise it much. Its simple folks -- IBM and Freescale/Motorola failed to deliver. And its not just performance per watt (liquid cooling anyone?) its performance and power. At the keynote of the recent Apple developer conference Jobs made sure the dig at IBM's failure to deliver did not go unnoticed, even if he avoided mentioning them by name. No 3.0Ghz (after it was promised) and no G5 in the power books. All this at a time when Mac sales apparently are showing slight signs of pickup (indeed all sorts of people are coming out of woodwork and buying Macs!). So much for thinking different.

Now, of course, we can discuss the real reason they went with Intel. First note that they did not go with x86, but rather with Intel; that is to say, they are not going with AMD. Why is this? Simple; their problem is (1) delivery of (2) high performance and (3) low power micro-processors. IBM failed on #2 and #3. AMD rules on #2 and #3, but they own only one fab, so they can't provide the guarantees Apple is looking for on #1 (though AMD has recently had very consistent delivery for the past 7 years, this is based on a smaller marketshare, and follows a horrible delivery record for the K5s and initial K6s.) Intel is not so good on #3, and lags a bit on #2, however, in both cases, the problems are not as extreme as they are with IBM -- and of course they have more than proven their ability to deal with #1. And Intel comes with a secret bonus -- they make their own compiler, which kicks the crap out of gcc and basically other compilers in general. Its the same reason Microsoft went with Intel for the original Xbox CPU, and not with AMD. This part is way over the heads of your typical Mac weenies. Choosing AMD would seem more in-line with Apple's "Think Different" mantra -- but it would still be a risk for Apple, which is the primary thing they are trying to avoid right now.

Another problem Apple has had to contend with is that IBM/Motorolo did not face direct competition even with each other for Apple's business. They always found themselves conveniently trading off the market segments. By not directly competing, there was never any market -> product pressure which Apple could leverage to ensure they had the best possible processors. Even if Apple commits to an Intel only solution, they benefit from competition from AMD, since Intel has to deal with that competitive pressure no matter what (because Intel sells its processors to vendors other than Apple). See how that works? Marketplace competition leading to a better product? After they destroyed Xponential, there just simply was no competition for the Mac CPU. Basically, at long last, they got it.

I have a confession to make. I never thought this would happen. Not because I thought it was the wrong thing for Apple -- au contraire, they should have done this a long time ago. But its precisely because it was the right thing for them to do, that I never thought they would do it. Apple's slogan has always been to "Think Different" -- a slogan they used to hide behind to avoid direct technical comparisons with cheaper, faster, more compatible, and scalable PCs. And now they are stuck. They can't win any rigged benchmarks anymore -- in fact, they are going to start losing them all, undisputably, to AMD based machines. Worse yet, they have a legacy PowerPC -> x86 transition period that they are going to have to deal with. Their "Rosetta" story sounds nice, but my suspicion is that this is going to be a lot more painful than they are suggesting.

So what did I miss in my thinking? Simple: the internal pressure for Apple to switch must have been enormous. And Apple being such a closed company, that was something hard to gage from the outside. This is why so few Apple serious pundits saw this move. Yet a few people like Dvorak did see it. Its not hard to see how a few leaks happened; Microsoft, Adobe, ATI, nVidia, and a few other key 3rd party software developers clearly have known this was going to happen for at least 6 months -- the fact that Darwin was always compilable on x86 has been known in the developer community for even longer. I also always assumed that Steve Jobs reality distortion field also extended to the inside of Apple.

Here are a few comments for the stoners who can't read this situation properly:

Anyhow, welcome to the dark side of the force you guys. :)

"Why I returned my iMac"

04/21/04 Here's what one user thought of a recent iMac purchase.

Serious problem with Apple iPods ...

11/30/03 Not that I pay attention to their non-Mac products, but it appears the iPod has a pretty serious design flaw regarding their battery:

Apple forced to withdraw misleading ad from U.K.

11/27/03 Its about time someone call them on their distortions of reality.

Some more G5 benchmarks

08/27/03 The guys at Bare Feats have been benching the Mac versus the PC for some time. This time around it looks like they are the earliest independent guys to post benchmarks:

Alternative View on iTunes

08/26/03 Here's an alternative look at Apple's whole iTunes idea:

Some more G5 dinging

08/20/03 Apple has continued issue its false advertising ... so I'll continue to ding them. Apple has also claimed that they are the first to ship a 64bit desktop machine. While the Alpha PC claim is a bit of an oddity, the Opteron based BOXX that shipped on June 4th of this year is not. The guys at digitalvideoediting.com called them on it:

Apple does not seem to be in compliance with Spec's trademark license:

From spec.org
Fair Use Guidelines

When any organization or individual makes public claims using SPEC benchmark results, SPEC requires that the following guidelines be observed:

  1. Reference is made to the SPEC trademark. Such reference may be included in a notes section with other trademark references (see http://www.spec.org/spec/trademarks.html for all SPEC trademarks and service marks).
  2. The SPEC web site (http://www.spec.org) or a suitable sub page is noted as the source for more information.
  3. If competitive comparisons are made the following rules apply:
    1. the results compared must utilize SPEC metrics and be compliant with that SPEC benchmark's run and reporting rules,
    2. the basis for comparison must be stated,
    3. the source of the competitive data must be stated, and the licensee (tester) must be identified or be clearly identifiable from the source,
    4. the date competitive data was retrieved must be stated,
    5. all data used in comparisons must be publicly available (from SPEC or elsewhere)
  4. Comparisons with or between non-compliant test results can only be made within academic or research documents where the deviations from the rules for any non-compliant results have been disclosed.
Please see the specific Fair Use rules for each group (GPC, HPG, and OSG) for any additional and specific requirements

When Steve Jobs made his original G5 presentation, he mentioned and showed graphs with SPEC, SPECfp, SPECint and SPECrate. But he did not establish a basis for comparison. He simply asserted the results. This is a violaton of 3-2. He also did not make mention of Veritest during his presentation (it was later discovered through some asterisk on Apple's website) which is a violation of 3-3.

Whether or not there is a violation of 4 is unknown. Since Veritest has not submitted the results to Spec, we cannot say for sure if they would survive scrutiny. (In prior discussion with Spec committee members, I have been told that results which appear hostile towards a vendor would be rejected -- Veritests results for the Dell machine look woefully unacceptable, and remember that since Apple was making the comparison with the results on the Dell, those figures must be in compliance as well.)

Something I missed in the my tables below:

SPECrate Int
Platform Veritest results Vendor results
Apple G5 2.0Ghz
16.9  
unknown
Dell P4 Xeon 3.06Ghz
16.7  
21.5  
Dell P4 3Ghz
10.3  
N/A
AMD Opteron 246 2Ghz
28.8  

SPECrate fp
Platform Veritest results Vendor results
Apple G5 2.0Ghz
15.8  
unknown
Dell P4 Xeon 3.06Ghz
11.1  
16.7  
Dell P4 3Ghz
8.1  
N/A
AMD Opteron 246 2Ghz
28.1  

Notice that Dell does not have SPECrate results available the non-Xeon machines. Its not because they are afraid of the results, but rather because they are not sold in multiprocessor configurations. Hyperthreading is not meant to be measured as providing the performance of an addition processor (especially since there isn't an extra processor being used ...) Hyperthreading provides a virtual extra processor whose purpose is to slightly increase performance from essentially rebalancing of multithreaded applications (so stalls while waiting for resources can be mitigated.) I.e., you are not supposed to run SPECrate on non-Xeon machines, since it doesn't make any sense.

Incorrect analysis on osnews.com

07/28/03 The guys over at OS news usually do a lot of good work by creating articles from the perspective outside of the narrow view of Windows. But unfortunately in absence of a mechanism for deep analysis, they have found themselves susceptible to Macaholic psychobabble. Their article entitled Analysis: x86 Vs PPC is the one causing me distress. I'm just going to go straight into debunking mode.
    Both the Athlon and Pentium 4 use longer pipelines (long and thin) with simple stages [...]
The Athlon pipeline is actually very wide. The Athlon is capable of simultaneously decoding three loads, three stores and three ALU operations per clock (it can actually only issue two of the memory operations per clock, thus limiting it to about 5-operations per clock.) While the Pentium 4 is certainly thinner than the Athlon (limited to 3 "micro-ops" per clock) its not fair to call that thin. For example, the PPC 970 and Pentium 4 have the same integer execution bandwidth (2 ops per clock.)
    "Each component of a computer system contributes delay to the system If you make a single component of the system infinitely fast... ...system throughput will still exhibit the combined delays of the other components." [...] The reason for the lack of scaling is the fact that memory performance has not scaled with the CPU so the CPU is sitting doing nothing for much of it's time
This is a general observation that has been pointed out by folks like Dr. John D. McCalpin, however to use this argument against the Pentium 4 is completely unfounded. In combination with Rambus memory (which is very expensive, but nonetheless rather amazing in terms of raw bandwidth.) Another thing to notice is that although DDRAM is technically slower than RDRAM, benchmarks of real world applications show that there simply isn't a significant difference between the two. I.e., it is fair to say that Intel and AMD (since they have managed to keep up with Intel's performance in general) have made appropriate trade-offs in their respective architectures to make sure that they aren't completely bottlenecked by memory bandwidth.
    Since AMD began competing effectively with Intel in the late 1990s both Intel and AMD have been aggressively developing new faster x86 CPUs. This has lead them to becoming competitive with and sometimes even exceeding the performance of RISC CPUs (If you believe the benchmarks, see below). However RISC vendors are now becoming aware of this threat and are responding by making faster CPUs.
The premise is correct, the conclusion is utter nonsense. There is no more Alpha, HP will be phasing it out. In addition the PA-RISC will be phased out in favor of Itanium (a non-RISC VLIW architecture from Intel.) MIPS is dead. UltraSparc has tail-ender performance -- ironically the only reasonable modern processor that they soundly beat is the Motorola G4. HAL/Fujistu hasn't really proven anything recently with their Sparc clone. Motorola is a complete joke. There's only one credible RISC vendor left who can challenge x86 on the desktop, and that's IBM with its PPC 970 and Power 4 processors.
    Both x86 and PowerPC have added extensions to support Vector instructions. x86 started with MMX, MMX2 then SSE and SSE2. These have 8 128 bit registers but operations cannot generally be executed at the same time as floating point instructions.
There is no MMX2. Floating point and SSE/SSE-2 can be executed simultaneously with no restrictions, and on the Athlon, the FPU's reordering mechanisms is so powerful that it can actually take MMX and FPU instructions which are seperated in software only and execute them simultaneously anyway (its possible the Pentium 4 can also do this, however, I am not as familliar with its pipeline limitations). That said, MMX covers integer vector operations (best for memory sensitive audio or video processing), which generally is not overlapped with floating point operations. This is just misrepresenation.
    However the x86 floating point unit is notoriously weak [...]
What?!?! As compared to what? x86 floating point has been consistently crushing PPC's on floating point performance for nearly half a decade now. The x86 floating point uses a convoluted instruction architecture, but modern x86 micro-architectures have been up to the task of working around these complications. Other RISC processors (with the exception of the PPCs from Motorola) have beaten x86s on Spec CPU FP mostly because of over the top (and very expensive) multi-bank memory architectures (that's correct -- Spec FP is a fairly memory limited benchmark). As Intel and AMD have worked hard on their memory infrastructures, and have now both introduced multi-bank memory architectures of their own, they have both been dominating Spec FP lately (Intel more so, as they have concentrated on memory bandwidth somewhat more.)
    Decoding the x86 instruction set is not going to be a simple operation, especially if you want to do it fast. How for instance does a CPU know where the next instruction is if the instructions are different lengths? It could be found by decoding the first instruction and getting it's length but this takes time and imposes a performance bottleneck. It could of course be done in parallel, guess where the instructions might be and get all possibilities, once the first is decoded you pick the right one and drop the incorrect ones.
Don't quit your day job dude -- both Intel and AMD use cached predecoding mechanisms. I.e., the first time they see instructions they decode them (slowly) and remember enough about the instructions so that the next time they decode them they can do so in parallel with very high performance. The Athlon stores and uses predecode instruction boundary, and branch bits while the Pentium 4 uses a full trace cache (i.e., the instructions are decoded into fixed width micro-ops which are cached instead of raw x86 opcodes.)
    Once you have the instructions in simpler "RISC like" format they should run just as fast - or should they? Remember that the x86 only has 8 registers, this makes life complicated for the execution core in an x86 CPU. x86 execution cores use the same techniques as RISC CPUs but the limited number of registers will prove problematic.
The x86 => macro-op/micro-op mechanisms translate straight to rename registers. I.e., false dependencies created by reuse of the 8 registers are automatically removed. For example, the Athlon has 88 internal floating point registers, with 36 destinations. In general, this gives the Athlon macro-ops the equivalent freedom of 36 registers. Modern x86s can internally unroll loops from multiple iterations and will create, essentially, "cloned registers" to make it execute as if there are more registers being used.

This narrow point also ignores the fact that x86's have far more advanced addressing modes than most RISC processors. This allows x86 instructions to fetch memory operands in the same x86 instruction as an ALU operation. Because of the sophisticated fully pipelined and out-of-order mechanisms inside the x86 architectures, this can allow arbitrary access of the L1 cache for read-only parameters with virtually no-penalty. This dramatically reduces the need for registers versus comparable RISC instruction sets.

    However when the x86 includes this kind of hardware the 8 registers becomes a problem. In order to perform OOO execution, program flow has to be tracked ahead to find instructions which can be executed differently from their normal order without messing up the logic of the program. In x86 this means the 8 registers may need to be renamed many times and this requires complex tracking logic. RISC wins out here again because of it's larger number of registers. Less renaming will be necessary because of the larger number of registers so less hardware is required to do register usage tracking.
As complicated as it is, it is present and accounted for in both leading x86 architectures. In fact, the Athlon uses an always rename policy with an implicit naming algorithm which imposes absolutely no overhead. The Pentium 4 uses a single rename stage, but otherwise has comparable rename capabilities (Intel unifies both FP and Integer rename registers, while AMD splits them.)

Let us also note that the PPC 970 also has very heavy renaming machanisms as well. This is very necessary because of its two-cycle minimum latency on integer instructions -- the CPU needs to find any parallelism when the compiler cannot. I cannot remember the details, but my recollection is that its renaming capabilities are more comparable to the x86s (in comparison to the G4 which has pathetic rename capabilities). So there's nothing to win, since it costs nothing, and the PPC 970 doesn't win anyway, since it has and is forced to use a similar mechanism.

  • AMDs x86-64 instruction set extensions give the architecture additional registers and an additional addressing mode but at the same time remove some of the older modes and instructions. This should simplify things a bit and increase performance but the compatibility with the x86 instruction set will still hold back it's potential performance.
The author has not established any basis for such a comment. The reality is that the Opteron is simply an amazing architecture that does not appear to be held back by anything. The ace's FLOPS test is NOT a comprehensive test of compilers that is comparable to Spec CPU. FLOPS is a benchmark which measures floating point performance, whose simplicity gives the compiler the best opportunity to map the algorithms to the best possible sequence of instructions for the CPU. The test is too narrow to give a complete assessment of a compiler's capabilities. It was originally used to test Intel's claim that "compilers in the future would demonstrate the Pentium 4's superior floating point performance".

Numerous other benchmarks on the web which do purport to be more comprehensive of general compiler performance are indicative that Intel's compiler is truly superior to any other x86 compiler by a fairly robust margin (typically between 10 to 25% on average versus the next best compiler.)

  • "Intel's chips perform disproportionately well on SPEC's tests because Intel has optimised its compiler for such tests"[13] - Peter Glaskowsky, editor-in-chief of Microprocessor Report.
This is a very old comment based on old versions of the Intel compiler. The reality is that Intel has invested quite heavily in their compiler technology for the purpose of building truly world class compilers, plain and simple. As a result, Intel will do very well on Spec benchmarks, and just about any other benchmark which can be recompiled with their compiler because their compiler is just an excellent piece of engineering.

For other comments about Apple's recent benchmark claims with the release of the G5, just read the story below, and the one below that.

G5/PPC970 redux

07/18/03 Ok, now that the Apple people themselves have gotten into trying to FUD the anti-FUD going on, it looks like I need to get into this myself. Here is the official Veritest report that was even further distorted by Apple in their false presentation of the G5 as the fastest system.

Veritest was formerly the imfamous ZD labs, who are well known for their broken and long since discredited CPUMark (whose results come substantially from one single x86 instruction) and FPUMark (whose results have more to do with the branch prediction performance than FPU performance) and many other useless benchmarks (all of which have superior contemporary replacements). When the Athlon CPU was first released, dozens of independent websites and review organizations were able to quickly and prominantly determine that it has a far superior floating point unit than the contemporary Pentium at the time. ZD Labs was one of the few organizations that was unable to determine this. In a bit of back and forth I did myself with one of their developers, challenging them on FPUMark, they gave very unsatisfactory "we stand by what we did" kind of answers, and they did not conceed that FPUMark was a very bad benchmark from its very inception (they instead said it was inappropriate as a modern bechmark, whereas I claim it always has been a bad benchmark.)

If one were to consider the reputation of the actors involved, I would say that Apple and Veritest are in the same group of having absolutely no crediblity whatsoever on claims of performance.

Let's start with the basic facts about the Spec CPU claims:

  1. Veritest's was hired by Apple to run the Spec CPU benchmarks for Apple. Dell did not participate in any way (and was likely unaware of the test taking place), however Apple was in a very tight loop, and worked with Veritest on compiler switches, and other system modifications. Notice in the Testing Methodology section of their test report they clearly did a major amount of tweaking of the Apple system (using the CHUD utility to change the caching policy, setting their prefetch mechanisms, etc), while doing very little to the Dell system (besides turning off X, which would have little to no impact)
  2. Veritest's conclusions do not support the claim that Apple's box was the fastest, as it still lost on the Spec CPU Int test. Note that these resuls are far lower than the officially endorsed Spec committee results.

    Spec CPU 2K Int
    Platform Veritest results Vendor results
    Apple G5 2.0Ghz
    800  
    unknown
    Dell P4 Xeon 3.06Ghz
    836  
    1014  
    Dell P4 3Ghz
    889  
    1151  
    AMD Opteron 246 2Ghz
    1248  

  3. Although the the G5 won on Spec FP, these results show their largest discrepancy from the official Spec endorsed vendor results (up to 90%(!) lower than what Dell measured.) I.e., the test where reality has been distorted the most is result that Apple emphasizes the most.

    Spec CPU 2K FP
    Platform Veritest results Vendor results
    Apple G5 2.0Ghz
    840  
    unknown
    Dell P4 Xeon 3.06Ghz
    693  
    1173  
    Dell P4 3Ghz
    646  
    1229  
    AMD Opteron 246 2Ghz
    1209  

  4. We have the additional problem of using the rather unusual NAGWare Fortran 90 compiler. For a comprehensive analysis of Fortran compilers you can go to the Polyhedron Software website. They show that NAGWare's compiler is in fact among the worst Fortran compilers available for x86s, with the Intel compiler being nearly 50% faster on most tests (Lahey's compiler is also signifcantly faster, so its not just some isolated Intel miraculous compiler we're talking about here).
  5. The compiler used for the test is gcc 3.3. This goes against the intent of the Spec rules to pick the most suitable compiler for your system. Veritest's claims are to pick this one compiler for both systems because it would make both platforms even, but even this claim is false.
    • Veritest installed a single threaded speed tweaked malloc library for the Apple system while not doing the same for the Dell system. This makes a significant difference, as all the official x86 Spec results use the MicroQuill SmartHeap tool to do this same thing for the official results. (It only takes a minute or so to see this discrepancy in the Veritest report, and similarly one can see the use of MicroQuill in the x86 Spec CPU Int reports -- certainly you couldn't miss this if you've spent a week scrutinizing it, as some Mac apologists have claimed.) If the heap accelerator tool they were using couldn't be used with the x86 version of gcc, then they could have used MicroQuill's SmartHeap itself which does work with Red Hat Linux for the PC.
    • On the G5 they have used the -freorder-block-and-partition flag which is used for feedback directed optimization. For some reason this flag is not used on the Dell system.
  6. The OS used for the Dell in the Veritest is Red Hat Linux. However, they used Mac OSX for the Apple G5! (There is a PowerPC port of Linux.) While this may seem acceptable, or irrelevant, tests in the past have shown that Linux's memory management induces additional overhead for very large applications versus something more common, like Windows 2000, or Windows XP. We know that in the last year Linus Torvalds himself has been involved in trying to fix these problems himself, however, I don't know the conclusions of these fixes. A look at the Polyhedron website given above shows that there is still something in Linux that is holding it back somehow.
  7. Apple has not submitted these results to the Spec committee for some reason. Given the work they put into them, and the fact that in theory they could submit these results, why don't they? The results from all other vendors have been submitted and are readily available at the spec web site. It seems quite possible that these results would be rejected for numerous violations of the Spec run rules.
So in summary: 1) This test was not equal, 2) This test was not representative and 3) Apple did not even represent the results of this test honestly.

If I can ever stop laughing ...

07/11/03 Just click on the link below. I still can't stop laughing.

Possible censorship at AppleInsider.com

07/01/03 According to my server logs, there was a link from the forums at appleinsider.com back to this page that got posted there very recently. Not exactly a slashdotting, but noticable nevertheless. The link:

    http://forums.appleinsider.com/showthread.php?s=&threadid=26240

Appears to have been removed (very atypical of the other forum links to this page, which happen from time to time.) So it was extremely short-lived if anything. Just out of curiosity, I wonder if anyone knows what that thread was about. Is it a case of the Apple fanatics censoring their own message boards in an effort to keep their community from the truth?

Apple releases the G5

06/24/03 At one of the typical Mac shows, Apple introduced the new PowerMac G5. Its based on the IBM 970 Power PC, its 64 bit enabled, and runs between 1.6Ghz and 2.0Ghz. Steve Jobs announced that it was worlds fastest desktop computer and the first to be 64bit enabled. The second claim is disputable (in the mid 90s DEC created something called the "Alpha PC", which just used a slower version of their 64bit Alpha processor and sold an ill-received PC based on it. Also AMD's Operton which was released some time ago is currently only being sold as a server, but obviously one could use it for a desktop, if one so desired.) But the first is an absolute lie.

Now of course if you've been to this page before, you know its very typical for me to call Apple/Jobs on when he overtly lies like this. And yes, I could easily pick his statements apart, I could go to the raw data and show that his claims are a complete crock, as usual.

But someone beat me to it, and did quite an excellent job. So let me just provide the link here:

Now of course, I should point out that these new G5 Macs are clearly far superior than the Motorola crap that they were using up until this point. Far, far, far superior. In fact its puts in within shooting distance of x86 performance. But many of us who watch the microprocessor industry have been anticipating this for more than a year. We have also anticipated the performance level and knew very well that Athlons, Opertons and P4s would still outperform the PPC 970 once launched. The only surprise was the 2.0Ghz released clock rate (we all thought it would be 1.8Ghz) but its not enough for the G5 to catch the x86 vendors.

So how can it be that this clean "RISC" based PowerPC manufactured by IBM using the worlds best process technology is unable to beat the pants off of the x86 vendors? (And how could I and others have known this would be the case beforehand?) The G5 is a very wide issue, deep pipeline, deeply dynamic/speculative OOE execution based CPU just like the x86s. But rather bizarrely, they designed in an interested quirk: All instructions execute with a minimum of 2 clock latency. Compare this to the Athlon whose minimum is 1-cycle (and the Athlon does an excellent job of getting most non-memory oriented instructions down to this), and the P4 whose minimum is 0.5 cycles (but few get below 1-cycle in real life.) The reason for IBM to do this is to give the instruction scheduling mechanisms enough time to scan their rather large 200+ entry instruction buffers. The problem, however, is that these 2 cycle bubbles will show up in your general software's bottom line. IBM's design trade off puts enormous pressure on the compiler to find parallelism in code to cover these 2 cycle delays (it can still execute instructions on every clock, but the CPU has find parallel instructions to pull this off.) The Spec CPU benchmark is a notoriously over-analyzed code base in which this parallelism is easy to find in many of the sub-tests. But not *all* of the sub-tests.

This trade off that IBM has made would have made sense if they used it to reach dramatically higher clock rates (since it does relax the instruction scheduling stage, which you would assume was done for the purpose of decreasing the *overall* clock time) but as we can see they only came out with 2.0 Ghz. AMD is at 2.2Ghz in their aging Athlon architecture, while Intel is just about to release a 3.2Ghz part (though their architecture is admittedly geared almost exclusively towards maximizing clock rate.) Remember that IBM has the world's state of the art manufacting process technology, so they don't have any excuse for not being able to keep up with Intel and/or AMD. Somewhere along the way IBM seems to have ended up making trade-offs in their architecture cut them short of being equal to the x86 guys. Its impossible to know exactly what they did without more information (the kind of information that an outsider just can't get access to) but it likely has a lot to do with its Power4 heritage. Just like Motorola was really only selling their PPCs to Apple as an augmentation to selling them as embedded processors, IBM's core motivations have been to sell their high margin Power4 workstations. It could just be a simple matter of being intrinsically a far more conservatively designed CPU (which server and workstation people care more about) thats being tweaked a just little bit in order to also sell them to Apple as a high performance desktop chip.

Well, that's probably enough microprocessor architecture discusions for today.

MacNET faces reality

05/11/03 Looks like even the faithful are starting to face reality.

Mac vs. PC III: Mac Slaughtered Again

11/19/02 A typical headline from me right? Well, its not mine. The site www.digitialvideoeditting.com has been running benchmarks comparing the Mac and PC on, you guessed it, digital video editting operations. This is where Mac users have been clinging to a claim to some kind of advantage. In their test, apparently the PC beats the stuffing out of the Mac. Well D'uh.

One for the archives

11/12/02 Just to give you an idea of how deluded and willing to make up anything to support their cause the Mac fans are take a look at this vintage story from over year ago: In particular, note the date on this story. Now "the register" is not exactly the most reliable source of information, but this appeared from various other "sources" as yet another carrot to dangle in front of Mac fans, and as something to repeat to all the PC nay sayers around them. Athlons are currently shipping at well over 2Ghz right now (they are using a model numbering scheme and I'm too lazy to look up the exact number right now) and the Motorola G5? Nowhere in sight.

And I'll bet dollars to donuts that the G5 will never ship. Ever. IBM's PPC 970 (which will barely bring Apple back to a level that almost seems competitive (it clocks at 1.6Ghz, but more importantly is its architectural features which will allow it to have non-laughable Spec CPU performance numbers)) has been announced, and will definately ship next year. Basically this means unless Motorola is keeping everything totally quiet for some reason, they will not have a competitve offering for Apple. And as IBM cranks the clock rate, Motorola will find their inability keep up with Moore's law will simply put them in no position to be making desktop CPUs at all.

But of course, this is actually fatal for Apple. The whole premise behind using the PowerPC architecture was that IBM and Motorola would *compete* with each other to make CPUs for Apple. This was supposed to ensure that Apple would always have a competitve CPU in their system. Well we see what happened when IBM decided to basically ignore the desktop PPC market for several years -- Motorola became lethargic, and essentially had a default monopoly position for PPC shipments to Apple. Intel really wasn't their competition, since Apple couldn't switch to them, and Motorola just didn't give a rats ass about high performance CPUs. What this means of course, is that they don't have the expertise (at least anymore.) IBM suddenly shows up with a CPU which is roughly somewhere back on the Moore's Law curve (they can do this because IBM has always been able to attract talented computer designers, and have a serious commitment to leading edge technological research), and we can see very visibly just what a sham the Motorola CPUs have been for the past 3 years. So is the competition between IBM and Motorola supposed to heat up again? Nope. I think Motorola is going to bow out of this one. They are happy to sell PPC just in the embedded space, and are unwilling to invest in the technology required to make a serious desktop processor. But this just means that IBM will eventually become lethargic and slow just as Motorola had, before it.

More condemning benchmarks

09/03/02 Gateway commissioned eTesting Labs to perform head to head benchmark testing between their "Profile 4" PC versus a comparable iMac (flat panel, home machines.) The results are as to be expected.

Parody of Apple switch ad

07/27/02 I just had to point out this hilarious parody of the recent Apple "Switch" ad campaign.

Apple fakes Xserve benchmarks

06/15/02 A discussion on comp.arch has indicated that Apple's xserve benchmarking is suspicious. What a surprise!:

> Rupert Pigott wrote:
> > "JF Mezei" wrote in message
> > news:3D2DEF2A.77468C7C@videotron.ca...
> > > re: benchmarks
> > >
> > > http://www.apple.com/xserve/performance.html
> > >
> > > has some "refreshing" benchmarks because they compare real applications,
> > > not some SPEC benchmarks.
> >
> > However I don't really get much of a picture of what's going on from
> > their "BLAST" benchmark which appears to compare two different code
> > bases. Note the codebase they used for the non-Apple platforms was
> > optimised for searches on word length "11", and if you look at the
> > graph you will note that the benchmark supports that. Meanwhile the
> > Apple version seems to scale pretty linearly with word length. Clearly
> > these codebases are very different. SPEC at least tries to compare
> > apples with apples (haha). Something that also stands out like a
> > sore thumb in these results is that Apple compared DDR G4s vs
> > SDRAM P4s... HMMMMMMMMMMMMMMMM.
> >
> > Again the Xinet benchmark seems to run into a wall, but this time
> > it's the Apple stuff which hits the wall, and bizarrely the Dell
> > box seems to trail off slightly... RIPs are quite complex pieces
> > of software which also stress the I/O subsystem (in particular
> > file write performance). If you don't have enough RAM they can
> > kick the shit out of your VM too. I also note that the link to
> > the benchmark details is convieniently broken. On inspection of
> > the xinet site itself I note that the configuration info available
> > for the systems is inadequate.
> >
> > As for Webench, fuck knows, but SPEC have web benchmarks too, you
> > should go look at them. Maybe Apple didn't like the scores they
> > got with SPEC's web benchmark suite and so decided to find a more
> > favourable one. :)
> >
> > The Bonnie benchmark was conducted in a dubious manner too. Again
> > we have inadequate configuration information available so we are
> > unable to tell if they pulled a fast one. Linux supports lots of
> > filesystems out of the box, plus you have the sync or async
> > mounting options. I don't think they adequately explained their
> > choice of filesize either, 2Gb looks like one of those "special
> > numbers" to me.
> >
> > > This is not only CPU, but also NETWORK as well as file server stuff. To
> > > me, these tests are far more meaningful than those SPECmarks because
> > > they really compare total systems for various types of workloads.
> >
> > It's also complete shite because they don't compare the same
> > codebases and they don't provide the source so you can verify this.
> > If that wasn't bad enough they don't give you enough configuration
> > info either. Sorry, but I'll take SPEC's benchmarks over Apple's
> > any day of the week. What's more SPEC do more than CPU/Memory/Compiler
> > benchmarks too. :)
> >
> > Cheers,
> > Rupert
>
> Apple has provided AGBLAST source code [1]. I believe that these benchmark
> results do indeed compare two different codebases, which isn't a very
> meaningful comparison in my opinion.
>
> First, AGBLAST includes a patch which vectorizes lookup table generation.
> Lookup table generation does not consume much cpu time to begin with, so
> it's not surprising that commenting out the altivec code did not show me a
> measurable slowdown on the G4 I tested.
>
> Secondly, a stride optimization to an io loop is responsible for the
> scaling seen for large word sizes. This optimization is portable but does
> not appear to have been included when they ran it on other platforms.
>
> Thirdly, using blastn with a word size of 40 is neither a realistic nor
> reccomended use of the blastn algorithm; one should use megablast for such
> searches.
>
> I was unable to repeat the results from Apple's press releases [2] [3].
>
> [1] http://developer.apple.com/hardware/ve/acgresearch.html
> [2] http://www.apple.com/pr/library/2002/feb/07blast.html
> [3] http://www.genomeweb.com/articles/view-article.asp?Article=200213094535
>
> -George Coulouris
> firstname at lastname dot org

High end iMacs running Mac OS X noticably slower than eMachines

04/19/02 According to a Wired story Mac OS X is biting at the ankles of Windows in terms of performance. Desktop graphics rendering is considered a "solved problem" in the PC world, to the point where people don't even bother benchmarking it any more. However, in head to head tests against the new iMacs running Mac OS X, the PC was noticably faster. Even the grandmaster of speedy web browsing technology, Opera, is unable to make the sluggish OS perform well.

PC vs Mac benchmark collection

03/28/02 I found this link to a collection of benchmarks comparing the PC and the MAC. By this point, I think the picture should be clear -- Macs are just very slow machines.

Spec 2K performance numbers for Apple

03/02/02 German computer magazine. "c't" has managed to run the Spec CPU 2000 benchmark on the latest Apple iMac. For whatever reason, they ran the latest and greatest iMac 1Ghz (the fastest CPU for iMac available at this time) against the ancient and crusty 1Ghz P-!!! (currently machines are available from AMD running at 1.66Ghz, and Intel running at 2.2Ghz both of which far outperform the old 1Ghz P-!!!) The P-!!! wins, but that's not really important. The G4 results were obtained and are 306 SpecInt base, and 187 SpecFP base

Let's do a comparison:

Spec CPU 2K
For other system results see see http://www.spec.org/osg/cpu2000/results/cpu2000.html

    SpecInt base   SpecFP base
Apple G4-1Ghz (c't) 306
   
187
   
Intel Pentium-!!! 1Ghz (c't/spec.org) 309
   
297
   
Athlon-1.667Ghz (spec.org) 697
   
596
   
Intel Pentium4 2.2Ghz (spec.org) 771
   
766
   

A quick survey of other Spec CPU results show these G4 results to be among the slowest for any modern CPU being shipped today. There is a discussion thread on this on USENET: click here

Apple lies about G4's performance

08/11/01 "Will" wrote me with the following:

Paul,

Surely you can come up with a retort to this on your Apple computer opinion page...

http://a1872.g.akamai.net/7/1872/51/fc3f3a53a0c596/www.apple.com/powermac/pdf/PowerPC-G4velocityengine.pdf

which is linked to from here:

http://www.apple.com/powermac/processor.html

Keep up the nice work on your pages.

-Will

Its a little dated (Oct. 2000) and I certain wish I had seen this earlier so that I could refute it earlier. But what the heck I can't resist:

  • What is a supercomputer?
    Many definitions of supercomputer have come and gone. Some favorites of the past are (a) any computer costing more than ten million dollars, (b) any computer whose performance is limited by input/output (I/O) rather than by the CPU, (c) any computer that is "only one" generation behind what you really need, and so on. I prefer the definition that a supercomputer performs "above normal computing," in the sense that a superstar is -- whether in sports or in the media -- somehow "above" the other stars.

Actually none of these are anything close to what I or most other people think of as a super-computer. A supercomputer is quite simply a machine built primarily for computational performance. PC's and Mac's highest concern is always about making the box affordable for consumers -- thus being cheap to build is the primary design criteria for desktop range PCs. Even workstations are at the very least built with high volume production goals within bounded price limits. Super computers are usually built to be optimized for performance per volume of area, or performance per single CPU, or performance per single operating system; not performance per dollar, or performance per watt, or something like that. The G4 is not a supercomputer (neither is a Pentium, or an Athlon for that matter.)

For example, Federal Express uses (or least use to use) a Cray supercomputer to track all its parcel routing. The reason they picked a Cray was for the tremendous volume of routing computation that was required, with a necessarily less than 24 hour turn around time required on those computations. Sandia National labs has used a 8000+ processor Pentium Pro super-cluster for nuclear explosion and weather simulation. The reason they chose this was for the sheer total GigaFlops that could be delivered by Intel by clustering up so many of them.

It should also be pointed out that while both Intel and AMD based supercluster machines are listed among the top500 super computers in the world right now, none use the Motorola Power PC (though some use the IBM Power3; this is a specially modified core that is not suitable for something like an iMac.)

  • ... Because almost all modern personal computers perform in the tens or hundreds of megaflops (more details later on such specifications),
At the time this was written the K6-500, Athlon 500, and Pentium III 500 were all capable of nearly 2 gigaflops of sustained execution. However, these are all single precision computations (using SIMD, like AltiVec uses.) The 64 (actually 80) bit FPU of the Athlon was only capable of a measely 1 gigaflop (the P-III could do 0.5 gigaflops). The G4's 64 bit FPU (the G4 doesn't have an 80 bit FPU) is comparable to the P-III's clock for clock.
  • What this means is that you can do things at a few gigaflops that you cannot conveniently do at 100 megaflops. That is, you can do things in a convenient time span or sitting, or most important, you can do in real time certain tasks that were heretofore undoable in real time on PCs. In this sense, the Power Mac G4 is a personal supercomputer: obviously personal (the latest ergonomic and efficient designs for G4 products amplify this) and, due to their ability to calculate in the gigaflops region, clearly above the pack of competing PCs.

Laugh! The first PC to run at 1 gigaflop was the AMD K6-2 at 350Mhz, which was available in 1998. PC's were able to decode DVD's in real time in software (at about 400Mhz or so) long before it was possible to do so on a G4 (the software wasn't available until long after the G4 500 was out, and I don't know how low of a G4 processor you can go and reasonably play DVDs.)

  • ... and yet these modern PCs cannot solve certain problems sufficiently rapidly, it is reasonable to define the supercomputing region, for today, as the gigaflops region. ...

    Of course, the state-of-the-art supercomputers of today reach into the hundreds of gigaflops, yet surprisingly, they are typically built from a host of gigaflop-level devices. A system like the Cray T3E ranges from about 7 gigaflops to the teraflop (1,000 gigaflops) region, depending on the number of individual processors. One way to envision the power of a desktop Power Mac G4 is that the low-end Cray T3E at 7.2 gigaflops has the equivalent speed of two G4 processors, each calculating at 3.6 gigaflops (an actual achieved benchmark for certain calcu-lations). By the way, the advertised U.S. list price for the bottom-end Cray T3E is $630,000 ([sic] http://www.cray.com/products/systems/t3e/index.html; October 11, 2000). On this basis, one can conclude that the Power Mac G4 is “between” the supercomputers of the 1980s and the very highest-end supercomputers of year 2000, and in fact is about as fast as the low-end versions of the latter, most modern supercomputers.

Oh god. Straight from the mouth of Steve Jobs. Look, nobody would buy a Cray T3D because of its ability to execute 7.2 gigaflops. Computers like that have enormous memory bandwidth. I.e., they can execute 7.2 gigaflops *ALL THE TIME*. PC's and Mac's can only reach into the gigaflop range of execution by solving very specific problems that are not memory limited. In any event it shows you the extent to which Apple is trying to twist the definition of a super computer. The in core-Gigaflops of the CPU alone are not the only measure of a super computer. Super computers are about real world computations. For some industrial applications 7.2 gigaflops may be sufficient, so long as it is always guaranteed, which commodity CPUs do not and cannot offer.

  • V-Factor (Pentium III) = about 1 to 1.5

    This result can be obtained from publicly available performance figures on Intel's website (www.intel.com). For example, using Intel's signal-processing library code, a length-1024 complex FFT on a 600-megahertz Pentium III runs at about 850 megaflops (giving a V-Factor of about 1.4), while Intel’s 6-by-6 fast matrix multiply performs at about 800 megaflops ( V-Factor about 1.3). (Note that one might be able to best these Pentium III figures; I cite here just one of the publicly available code library results. These results do exploit the Pentium III processor’s own MMX/SSE vector machinery, and I caution the reader that these rules of thumb are approximations.)

As is typically the case with Apple, they chose numbers which are convenient for their purposes. A straight-forward reading of the SSE or 3DNow! specifications would show that this so called "V-Factor" is as high as 4 (it can sustain up to 4 results per clock), and if you are below 2 while using one of these SIMD instruction sets, they you are doing something wrong. So 2 to 4 would be a more representative range. If Intel is showing worse results for some tests its because they are probably beating some other (more relevant) competitor with a "good enough" solution that they couldn't or didn't vectorize.

  • Cryptography and "big arithmetic"
    Again on the subject of vectorized integer processing, there is the burgeoning world of cryptography. It is well known these days that cryptography involves large numbers, like prime numbers. It is not an exaggeration to say that most people have used prime numbers, at least indirectly, in web activities such as mail and credit card transactions. For example, when you order garden tools over the web, your credit card transaction may well involve a pair of prime numbers somewhere in the ordering chain.

    There is also a whole field of inquiry, called computational number theory, that is a wonderful academic field, attracting mathematicians, engineers, and hobbyists of all ages. Whether the interest is academic or commercial, it turns out that the G4 is quite good at this “big arithmetic.” Here is an example: When timed against any of the public arithmetic packages for Pentium (there are a dozen or so good packages), the G4 beats every one of them at sheer multiplica-tion. In case you are familiar with timing for big arithmetic, a 500-megahertz G4 will multiply two numbers each of 256 bits, for a 512-bit product, in well under a microsecond. And here is a truly impressive ratio: In doing 1024-bit cryptography, as many crypto packages now do, the G4 is eight times faster than the scalar G3. That huge speedup ratio is due to many advantages of the vector machinery.

    But there is more to this happy tale: The G4 can perform arbitrary vector shifts of long regis-ters, and these too figure into cryptography and computational number theory. There are some macro vector operations that run even 15 times faster than their scalar counterparts, because of the vector shifts. In such an extreme case of vector speedup, it is not just the vector architecture but also the unfortunate hoops through which the scalar programmer must some-times jump —hoops that can happily evaporate in some G4 implementations. (Technically, arbitrary shifting on scalar engines must typically use a logical-or-and-mask procedure on singleton words, which is a painful process.)

This is quite simply an outright lie. It is exactly the opposite -- for big number arithmetic the G4 is an unsually slow processor, in particular with the big multiplications and shifts. Just take a look at the GMP benchmark for a comparison (the performance scales with clock rate, and the code cannot be improved using AltiVec -- thus the G3 and G4 performance is identical per clock.) The P6 beats it, but the Athlon really hands it its head.

Apple threatens users with "Look and Feel" suits

06/16/01 People who are writing themes and theme editing tools for Mac OS X are getting cease and desist letters from Apple's lawyers. Ooh, nice one Apple -- piss of the *developers*. That'll sure enamor the programming community to Mac OS X. Thanks to "Mac haxor" Peter Perlsø, for the links.

Reaction to OS X from the Apple faithful

06/04/01 It seems that OS X is being less well recieved than Apply might have hoped. An article on macworld gives a long lists of Mac-faithful complaints about the new OS. I've also listed a somewhat even handed review of OS X by "NK Guy". Thanks to "Mac haxor" Peter Perlsø, for the links.

Benchmarks, benchmarks, and more benchmarks

12/22/00 While looking around the net for other benchmarks I stumbled across the NBench benchmark. It takes a subset of the ByteMark tests (presumably avoiding some of the obviously "rigged for certain compilers" sub tests, like the bitfield test) and runs the tests on Linux/Unix workstations using the provided gcc compiler.

NBench Performance
Fastest results see http://www.tux.org/~mayer/linux/results2.1.html

    MEM   INT   FP
Motorola G3-400Mhz 1.985
   
2.665
   
4.560
   
Intel Pentium-!!! 1Ghz 4.485
   
3.669
   
9.044
   
Athlon-1Ghz 3.463
   
4.623
   
8.232
   
Alpha EV6-750Mhz 5.249
   
4.847
   
14.497
   

More Benchmarks

12/21/00 A german website has benchmarked the infamous LateX technical formatting language on a number of processors and found the G3 processors at about half the performance of comparably clocked x86 processor. Given their Mhz disadvantage these days, that would make them probably about one quarter the performance (comparing fastest G4's to the fastest Athlon of Pentium !!!/4). (The Sparc results are pretty hilarious as well.)

BeOS revisited

12/03/00 This is not really news, but I had not seen it before. www.macspeedzone.com appears to have some inside information about the NEXT vs Be vs Copland battle that ultimately lead to Apple buying NEXT for some $400 million.

The title is almost certainly an exaggeration. It seems doubtful to me that Apple would have truly gone with Windows NT (and doom their platform to be blatantly obvious second class citizens by virtue of less support from MS and publically losing every performance benchmark in sight.)

The reason I bring this up is just to comment just how ridiculously this was handled by Amelio. Obviously Steve completely played Amelio for the fool that he is. But there was more to it than that. Amelio let Apple's deseperate situation lead them to make some truly bad errors in judgement:

  • While Be OS may have clearly needed work, the ridiculous MacOS X delays should put that all into perspective. This is the sign of management that does not understand technology development. A more honest assessment of their capabilities still puts Be OS ahead -- with Apple's resources, there's no telling where Be OS could be today.

  • NEXT cost $400 million. Be was asking for $200 million, Apple was only willing to go as high as $120 million. Was NEXT really worth more than 3x of what Be was worth? There was probably a reason that the NEXT OS was not selling very well.

  • Did Be know that they were competing with NEXT OS? Had Gasse' known the stakes, and what their competition was, perhaps he could have convinced his backers to accept a lower price. Apple may have been under time pressure, but for a decision as monumentous as this, that artificial Jan. 7, 1997 deadline could have been pushed or ignored. What Apple needed to do was to make sure they put Gasse' in his place, and force him to accept a fair price and not blow them off when they demand an updated demo. And of course they needed to do that with NEXT as well. Then they could have made a more rational decision.

  • Being a portable, realtime, pervasively multi-threaded, multiprocessor-capable OS is real technology. Using a non-standard language made up just for your OS (Objective C) and concatenating buzzwords (Web-Objects) is not. I wonder if Apple had the technical know how to properly evaluate these two technologies.
Older Apple News


Apple Links


Graphic6 Graphic7 Graphic8

Valid HTML 4.0!