ClearSpeed CSX600
October 7, 2004, 5:41 AM CST by chairmansteve
New product announcements are coming from the Fall Processor Forum 2004. ClearSpeed revealed CSX600, a 96-core math co-processor capable of 50 GFLOPS at just 5 watts power consumption. And it achieves that 50 GFLOPS running at just 250MHz. The plan is to put the chips on PCI-X (and perhaps later PCI Express) cards and offload mathematically intensive codes from the main processor (Intel/AMD).

ClearSpeed Technology today announces details of its first commercial microprocessor, the CSX600. In a presentation at the Fall Processor Forum in San Jose, California, one of the world’s key semiconductor industry conferences, the Company will be presenting performance data showing the new processor to be the highest performing product of its kind. The new chip, which delivers up to 50 GFLOPS for just 5 watts power consumption, is expected to become available by the end of Q1 2005.

...

Technical Summary
50 GFLOPS 32/64-bit, 25 GMACS 16-bit fixed point
96 Gbytes/s bandwidth to on-chip memory
11 Gbyte/s off-chip bandwidth
64-bit flat address space, 48-bit physical
Gluelessly daisy-chain multiple devices for higher performance
Programmed in C with a familiar, simple programming model
5W typical power

http://www.clearspeed.com/news.php?page=pr&pr=25
CSX600 also has 128KB SRAM and DDR2 external memory interface.

The CSX600 has 96 Processing Elements (PEs), 128 Kbytes of on-chip scratchpad SRAM, DDR2 DRAM interface and I/O all interconnected by the ClearConnect® on-chip network. Each PE has an integer MAC, a dual 64-bit FPU and 6 Kbytes of local memory.

http://www.clearspeed.com/products.php?page=si
250MHz x 2 FLOPS x 96 = 48,000 MFLOPS = ~50 GFLOPS
June 21, 2005, 7:09 AM CST by chairmansteve
ClearSpeed has a public demonstration of its CSX600 today at the International Supercomputer Conference (ISC) in Heidelberg. I also learned that CSX600 contains 128 million transistors, the current chips are built with a 130nm process, and the PCI-X boards hold up to 4GB DDR2 SDRAM in two memory slots.

The ClearSpeed demonstration systems include an IBM IntelliStation with AMD dual-Opteron processors and a second box with dual-Xeon Intel processors. One system is running two dual co-processor CSX600 boards, and the second system includes one CSX600 board. At 100 GFLOPS (sustained with two boards) and 50 GFLOPS (sustained with one board), these systems are believed to demonstrate the highest ratio of performance per watt ever achieved in workstations.

...

Given its compatibility with PCI-X, ClearSpeed boards can be quickly installed in existing workstations and servers to provide significant increases in performance. Such system upgrades, while offering up to a 10x increase in performance, can be achieved without additional power or cooling due to the efficiency of ClearSpeed's architecture.
http://www.clearspeed.com/news/pr.php?pr=28
http://www.clearspeed.com/downloads/overview_csx600.pdf
http://www.clearspeed.com/downloads/overview_csx600_board.pdf

In a sense, CSX600 is like a PPU but for professional applications.

Features
250 MHz clock
96 high-performance processing elements
576 Kbytes PE memory
128 Kbytes on-chip scratchpad memory
64-bit DDR2 DRAM interface with ECC
ClearConnect bus provides on-chip and inter-chip data transfer
Host interface & debug port
64-bit virtual, 48-bit physical addressing
Instruction and data caches
On-chip DMA controller
That 576KB memory is split into 6KB per core. The 128KB scratchpad is shared.
June 22, 2005, 5:39 AM CST by pjbliverpool to chairmansteve
To be honest I would rather just see them beef up the floating point capabilities of current CPU's. One of these things would fit nicely in place of a third CPU core I expect. Given that it would be running at a much higher clock speed, it could use far less die space to achieve the same performance.
© 2000-2005 pcvsconsole.com