1. Home
  2. Computing & Technology
  3. Wireless / Networking
IBM Blue Gene
The world's most advanced network supercomputer from International Business Machines will tackle Grand Challenge problems
An Article by your Guide Bradley Mitchell
 Join the Discussion
"I was at an IBM seminar today, and one of the speakers asked the audience:

'Do you have any idea what the size of the Internet is these days? You know, you've got bytes, kilobytes, gigabytes, terabytes... and do you know what comes after that?

'Someone in the audience piped up - Trilobites!'"
-XAVIANA
 
 Related Resources
• Network Clustering
• Bandwidth and Latency
• Network Topologies
• Availability Concepts for Networks and Systems
 
 Elsewhere on the Web
• IBM Research - Blue Gene
 

Beyond Performance - Reliability, Availability, and Serviceability (RAS)

Supercomputers are notoriously unreliable. They contain so many more processors, memory chips, disks and cables than an ordinary computer, that the "law of averages" alone dictates failures will occur much more frequently. In addition, some supercomputing software and hardware components are products of cutting-edge research, new and not yet "debugged." Finally, supercomputers are usually placed under heavy computing and communications workloads 24 hours a day.

Supercomputer failures and "downtime" are extremely costly, due to the importance of their "computational mission" and the scarcity of available systems. To reduce downtime, modern supercomputers are built with a focus on three key aspects of operation:

  • Reliability - likelihood of a failure occurring in the running system
  • Availability - "uptime" of system and/or system resources in the presence of a failure
  • Serviceability - ability to quickly detect, isolate, and recover from failures that occur

Some supercomputer designs incorporate additional hardware and software to support system RAS. Redundant power supplies and cooling systems are common, for example. A single failure in a redundant component typically does not cause downtime (impact the availability of the system). Hot-swappable components like disks and power supplies also are commonly employed to improve system serviceability.

The ASCI Red system features a Ethernet network, separate from the primary interconnect, that is used to detect and recover from failures to improve serviceability and manageability.

SMASH RAS

The IBM Blue Gene system adopts an architectural approach called SMASH (Simple, Many and Self-Healing). SMASH attempts to build into the system features that will minimize failures and downtime.

Conclusion

Supercomputing is currently the most advanced of all high-performance computing (HPC) approaches. HPC alternatives like clustering and grid computing hold great promise for achieving high performance computing and communications at a low cost, but for now, supercomputers remain the world's fastest computing systems best suited for solving many critical scientific problems.

Supercomputers remain expensive and highly specialized. They run a limited set of applications and require non-stop "care and feeding" to keep running smoothly.

The IBM Blue Gene system represents the next generation of advanced supercomputing technology. When complete, it will operate at speeds up to 100 times greater than today's systems. In addition to solving computational problems, Blue Gene should influence the way in which mainstream computers of the future are built.


» Today's Commmunity Discussions
» More Feature Articles

Subscribe to the Newsletter
Name
Email

Explore Wireless / Networking
About.com Special Features

Stay connected and entertained with reviews on tips on the latest HDTVs, cellphones and more. More >

Easy ways to connect two computers for networking purposes. More >

  1. Home
  2. Computing & Technology
  3. Wireless / Networking

©2009 About.com, a part of The New York Times Company.

All rights reserved.