Embedding the optimal all-to-all personalized exchange on multistage interconnection networks +

Authors: Roselin Petagon  Department of Computer Science, Faculty of Science, King Mongkut’s Institute of Technology, Ladkrabang, Bangkok 10520, Thailand
Jeeraporn Werapun Department of Computer Science, Faculty of Science, King Mongkut’s Institute of Technology, Ladkrabang, Bangkok 10520, Thailand

Published in:
· Journal
Journal of Parallel and Distributed Computing archive
Volume 88 Issue C, February 2016
Pages 16-30
Academic Press, Inc. Orlando, FL, USA
table of contents doi>10.1016/j.jpdc.2015.10.005

Abstract
All-to-all personalized exchange (ATAPE) is an inspired process to speedup the parallel and distributed computing. Recently, ATAPE algorithms were successfully applied on multistage interconnection networks (MINs), including baseline and butterfly networks. However, routing of those algorithms on MINs relies on switch-patterns for stage-control from sources (S), which is a half-routing solution since they cannot perform a full self-routing with the (S, D) protocol for all MINs. In this paper, first we propose a full-routing solution of the realizing ATAPE on a class of d-nary-switch MINs + (i.e., baseline +, butterfly +, etc.). Our ATAPE can be embedded on-chip effectively for not only (S, D) self-routing but also stage-/switch-control routing. Two embedded ATAPE functions incorporate with multi-stage pipelining are proposed in optimal O(N + log 2 N) : 1. a (default) static function $D = S \oplus (C + \text{order}) \mod N$ and 2. an (optional) f-in-1 dynamic function $D = (S + C + \text{order}) \mod N$ with the incrementing counter $C = 0$ to $N - 1$. Second, we introduce a crossbar of MINs + with fewer delay-stages to achieve the ultimate ATAPE embedding. Finally, experimental results of applying ATAPE on such MINs + are confirmed fruitfully, including the ATAPE-based N x N -matrix transposition in O(N + log 2 N), which yields the significant speedup.

Embedding the optimal all-to-all personalized exchange (ATAPE) on MINs + is proposed for the fast on-chip process. Static ATAPE ($D = S \oplus (C + \text{order}) \mod N$) and f-in-1 dynamic ATAPE ($D = (S + C + \text{order}) \mod N$) are ultimately embedded for full (S, D)-routing on MINs +. Existing ATAPE approaches rely on switch-/stage-control routing on MINs by either S- or D-routing. Correctness of the proposed ATAPE functions on d-nary-switch MINs + and a crossbar of MIN + are proven. Experimental results are confirmed fruitfully for N = 64 to 16384 processors with the significant speedup.