Design and Evaluation of Efficient Collective Communications on Modern Interconnects and Multi-core Clusters
Author | : |
Publisher | : |
Total Pages | : |
Release | : 2001 |
ISBN-10 | : OCLC:680286939 |
ISBN-13 | : |
Rating | : 4/5 (39 Downloads) |
Book excerpt: Two driving forces behind high-performance clusters are the availability of modern interconnects and the advent of multi-core systems. As multi-core clusters become commonplace, where each core will run at least one process with multiple intra-node and inter-node connections to several other processes, there will be immense pressure on the interconnection network and its communication system software. Many parallel scientific applications use Message Passing Interface (MPI) collective communications intensively. Therefore, efficient and scalable implementation of MPI collective operations is critical to the performance of applications running on clusters. In this dissertation, I propose and evaluate a number of efficient collective communication algorithms that utilize the modern features of Quadrics and InfiniBand interconnects as well as the availability of multiple cores on emerging clusters. To overcome bandwidth limitations and to enhance fault tolerance, using multiple independent networks known as multi-rail networks is very promising. Quadrics multi-rail QsNetII network is constructed using multiple network interface cards (NICs) per node, where each NIC is connected to a rail. I design and evaluate a number of Remote Direct Memory Access (RDMA) based multi-port collective operations on multi-rail QsNetII network. I also extend the gather and allgather algorithms to be shared memory aware for small to medium messages. The algorithms prove to be much more efficient than the native Quadrics MPI implementation. ConnectX is the newest generation of InfiniBand host channel adapters from Mellanox Technologies. I provide evidence that ConnectX achieves scalable performance for simultaneous communication over multiple connections. Utilizing this ability of ConnectX cards, I propose a number of RDMA based multi-connection and multi-core aware allgather algorithms at the MPI level. My algorithms are devised to target different message sizes, and the performance result.