# A New Efficient Routing Algorithm for Network-on-Chip With Best Input and Output Selection Techniques

Ebrahim Behrouzian Nezjad, Ahmad Khadem Zadeh, Amin Javadi Nasab, Ehsan Behrouzian Nezjad, Ali Shaneh Sazan, Dept. of Electrical and Computer Engineering Azad University, Shushtar branch, Iran

Abstract - The performance of Network-on-Chip (NoC) largely depends on the underlying routing techniques, which have two constituencies: output selection and input selection. Previous research on routing techniques for NoC has focused on the improvement of output selection. In this paper we will improve both output selection and input selection. In this paper, we present a novel routing algorithm(BIOS) which combines the advantages of both deterministic and adaptive routing algorithms. More precisely, we envision a new routing technique which judiciously switches between deterministic and adaptive routing based on the network's congestion conditions. When there are contentions of multiple input channels competing for the same output channel, our input selection technique decides which input channel obtains the access depending on the contention level of the upstream switches, which in turn removes possible network congestion. Simulation results with different traffic patterns show that our new routing algorithm(BIOS) achieves significant better performance than the other deterministic and adaptive routing algorithms.

**Keywords**: system-on-chip, network-on-chip, routing algorithm, input selection, output selection.

#### 1. Introduction

Future generations of systems-on-chip (SoC) will consist of hundreds of pre-designed IPs<sup>1</sup> assembled together to form large chips with very complex functionality. As technology scales and chip integrity grows, on-chip communication is playing an increasingly dominant role in System-on-Chip (SoC) design[1,2].To deal with the increasingly difficult problem of on-Chip communication, it has been recently proposed to connect the IPs using a Network-on-Chip (NoC)architecture. Each core is connected to a switch by a network interface. Cores communicate with each other by sending packets via a path consisting of a series of switches and interswitch links [3,4]. The problem of defining communication protocols for these NoCs is not an easy matter since the resources used in traditional networks are not available on-chip. The performance of NoC largely depends on the underlying routing technique, which chooses a path for a packet and decides the routing behaviour of the switches[5]. routing algorithms can be generally classified into two types: deterministic and adaptive. deterministic routing, the path is completely determined by the source and the destination address. On the other hand, a routing technique is called adaptive if, given a source and a destination address, the path taken by a particular packet depends on

dynamic network conditions (e.g. congested links due to traffic variability)[6,7].

## 2. Related Work

The idea of NoC is derived from large-scale computer networks and distributed computing [8, 9]. However, the routing techniques for NoC have some unique design considerations besides low latency and high throughput. Due to tight constraints on memory and computing resources, the routing techniques for NoC should be reasonably simple [10]. Several switch architectures have been developed for NoC [11-13], employing XY output selection and wormhole routing. In [14], a deflective routing technique is proposed to avoid network congestion by spreading the traffic over a larger area. It performs output selection based on the number of packets being handled in the neighboring switches. Packets are forwarded to switches with less traffic load. The routing technique proposed in [10] is similar to [14] in terms of acquiring information from the neighboring switches to avoid network congestion, but uses the buffer levels of the downstream switches to perform the output selection. A routing scheme which combines deterministic and adaptive routing is proposed in [7], where the switch works in deterministic mode when the network is not congested, and switches to adaptive mode when the network becomes congested. The routing techniques [7,10] focused on the output selection. The motivation of this paper is to improve both input selection and output selection to develop a simple yet

-

<sup>&</sup>lt;sup>1</sup> Intellectual property

effective routing algorithm. Two input selections have been used in NoC, First-Come-First-Served (FCFS) input selection and Round-Robin input selection. In FCFS, the priority of accessing the output channel is granted to the input channel which requested the earliest. Round-robin assigns priority to each input channel in equal portions on a rotating basis. FCFS and Round-robin are fair to all channels but do not consider the actual traffic condition.

## 3. Proposed Routing Algorithm

We will present a novel routing algorithm which combines the advantages of both deterministic and adaptive routing schemes. More precisely, we envision a new routing technique which judiciously switches between deterministic and adaptive routing based on the network's congestion conditions. Each router in the network continuously monitors its local network load and makes decisions based on this information. When the network is not congested, router works in a deterministic mode and thus enjoys the low routing latency enabled by deterministic routing. On the contrary, when the network becomes congested, the router switches back to the adaptive routing mode and thus avoids the congested links by exploiting other routing paths; this leads to higher network throughput(output selection technique).also, we present a novel input selection technique for NoC that improves the routing efficiency. The input selection consider the contention level of the upstream switches. By granting busier input channel higher priority to access the output channel, input selection keeps the traffic in busy paths flowing, therefore removes possible network congestion. The platform under consideration is composed of a n\*n array of tiles which are interconnected by a 2D mesh network. We choose the 2D mesh network mainly because of two reasons: it naturally fits the tile-based architecture and because it has been frequently discussed in other NoC work . However, we emphasize that our algorithm can be easily extended for other topologies. Fig. 1 shows an abstract view of a NOC in this architecture. As shown, each tile is composed of a resource(R) and a switch or router(S).

In our experiments, the XY routing scheme is picked up as a representative deterministic routing scheme because of its simplicity and wide popularity. In short, for 2D mesh networks, the XY routing first routes packets along the X-axis. Once the packets reach the column wherein lies the destination tile, they are then routed along the Y -axis. Chiu proposed the odd-even turn model in [8] which restricts the locations where some types of turns can take place such that the algorithm remains deadlock-free. More

precisely, the odd-even routing prohibits the eastnorth and east-south turns at any tiles located in an even column.



Fig.1. The typical structure of a 4\*4 NOC

It also prohibits the north-west and south-west turns at any tiles located in an odd column. Compared to other adaptive routing algorithms without virtual channel support, the degree of the adaptiveness provided by the odd-even routing is distributed more evenly across the network. Thus, in this paper, we choose the minimal odd-even routing as the adaptive routing scheme for on-chip routers. The use of minimal routing helps not only in reducing the energy consumption of communication, but also to keeping the network free from the livelock.

In this subsection, we present the actual router design, which implements the concept of BIOS routing algorithm.



Fig.2. BIOS router architecture

Combining odd-even and XY to form a router may lead to deadlock problem. Thus, we develop a new routing scheme, called DOE, as the deterministic

routing mode in new algorithm. DOE is indeed a deterministic version of odd-even based on removing the odd-even's adaptiveness. Fig.2 illustrates the detailed architecture of a switch for our new routing algorithm(BIOS). Each input port in Fig. 2 has a separate which buffers the input packets before delivering them to the output ports.

## 3.1. Output Selection

When a new header flit is received, the output selector(OS) processes that flit and determines which output port the packet should be delivered to. When the router works in the odd-even mode(adaptive mode), there can be more than one output direction to route packets. In this case, the output selector(OS) will choose the direction in which the corresponding downstream router has more empty slots in its input FIFO. For instance, let us suppose the router located in the position of (1; 1) as in Fig. 3 receives a packet from router (1; 0) with the destination address (3; 3). Under odd-even mode, the packet can be routed either to north or to east in this router. In this case, the output selector compares the occupancy of the south input FIFO at router (2; 1) with that of the west input FIFO at router (1; 2). If the former has more flits in the FIFO, then the packet will be routed to the east to router (1; 2); otherwise, it will be routed to the north to router (2; 1). Once the router has made its decision on which direction to route, the output selector sends the connection request to the input selector(IS) in order to set up a path to the corresponding output port.



Fig.3. Output selection with BIOS in a typical NOC

Except for the local output selector, each output selector also monitors its FIFO occupation ratio. If the ratio reaches the specified congestion threshold, a value 1 will be asserted on the corresponding congestion flag. Otherwise, a value of 0 will be asserted. If the FIFO is full, a value 2 will be asserted on the corresponding congestion flag. Intuitively, a value of 1 in the congestion flag indicates to the

upstream router that the downstream router is congested, and it is better to use adaptive routing in this case in order to avoid possible congested links. On the other hand, a value of 0 tells the upstream router that congestion is not an issue and it should dump the packets out as fast as possible for minimum latency. a value of 2 tells the upstream router that congestion in downstream router is very high and this router is not able to route packets at this time. The mode controller continuously monitors its neighboring congestion to determine whether the deterministic or the adaptive routing mode should be used. If any congestion flag from its neighboring routers are asserted 1or 2, then the mode controller commands all the output selectors to work at the adaptive (odd-even) mode; otherwise, it switches the output selectors to deterministic (DOE) mode. If all congestion flag from its neighboring routers are asserted 2, the packet should sent back to previous upstream switch that come from(backtracking).

## 3.2. Input Selection

Multiple input channels may request simultaneously the access of the same output channel, e.g., packets p0 of input\_0 and p1 of input\_1 can request output\_0 at the same time. The input selection chooses one of the multiple input channels to get the access. This section presents an input selection that performs more intelligent ,by considering the actual traffic condition, leading to higher routing efficiency.

In this paper we consider NoCs with 2D mesh topology . Wormhole switching [9, 13] is employed because of its low latency and low buffer requirement. The basic idea is to give the input channels different priorities of accessing the output channels. The priorities are decided dynamically at run-time, based on the actual traffic conditions of the upstream switches. More precisely, each output channel within a switch observes the contention level (CL)(the number of requests from the input channels) and sends this contention level to the input channel of the downstream switch, where the contention level is then used in the input selection. When multiple input channels request the same output channel, the access is granted to the input channel which has the highest contention level acquired from the upstream switch. This input selection removes possible network congestion by keeping the traffic flowing even in the paths with heavy traffic load, which in turn improves routing performance. To show the influence of input selection on the routing efficiency, consider the example of Fig.4, which shows a network of switches (cores are ignored for simplicity). Note the grey scale of the switches indicates the number of packets waiting at the switches. The white colour switches have low number of waiting packets, whilst the grey colour switches have higher number of waiting packets, and the black colour switche at (2, 2) has the highest number of waiting packets. Packets p1 at (3, 2) and p2 at (4, 3) both want to travel through (3, 3). In this case, a good choice would be let p1 take the priority to access (3, 3), because the switch at (3, 2) has more waiting packets than the switch at (4, 3). Such an input selection helps reduce the number of waiting packets in congested areas. This removes possible network congestions and leads to better NoC performance. Based on this observation, a new input selection is developed. For the input channels connected to the cores, there are no upstream switches transmitting CL to them. The CL value is set to 0 for these input channels. Therefore, the packets already in the network have higher priority than the packets waiting to be injected into the network.



Fig .4. Impact of input selection

With a little attention to above input selection technique, we notice an important problem. If an input channel which has lower CL continuously competing with channels which have higher CL, obviously will be defeated any time. The packets in this channel, won't be able to get their required output channel and face with starvation and this will cause the problem of decreasing network efficiency. Thus, there is a starvation possibility in this new input selection technique, because it performs input selection only based on the highest contention level (CL) and the channels with low CL have a little chance for winning. SO, now we try to consider priority parameter in a way that input channels with low CL, have the opportunity to win. Therefore, in addition to CL, another parameter with the name of AGE for every input channel is taken into consideration and measure of priority will be a compound of CL+AGE. The initial value of AGE for every channel is zero. When some input channels compete each other to achieve a specific output channel, finally only one channel will succeed. After this, the AGE of the winner channel will reset to zero

and AGE of the other channels entering in the competition will increase for one unit. With this new criteria(CL+AGE) each time that an input channel compete with other input channels to achieve specific output channel, in case of failure, it's AGE increase one unit and this increase its priority for the following competitions and this itself increase the opportunity of success and finally this channel be able to gain its desired output channel.

In competition the following conditions may occurs:

- a) if the priority(CL+AGE) of an input channel be higher than of other input channels, then the desired output channel will be granted to it and then its AGE will reset to zero and the AGE of all other input channel will increase for one unit.
- b) If multiple output channels have the equal priority, the output channel will be granted to that input channel which has higher AGE, then its AGE will be reset to zero and the AGE of other input channels will increase for one unit.

In this way, we have been able to remove the problem of starvation. Therefore, BIOS routing algorithm is deadlock and starvation free and has a high degree of adaptiveness and is able to make decision according to real traffic conditions of the network.

### 4. Experimental Results

To evaluate the performance gains that can be achieved with BIOS, we developed a C++ based simulator on standard template libraries(STL). Simulations are carried out on a 6×6 mesh NoC. we simulate several square mesh networks with different routing schemes and design parameters under different traffic patterns. Under each load and configuration, three types of mesh networks are simulated, which use XY, odd-even and BIOS algorithm respectively. For a given packet injection rate (i.e., the number of packets injected to the network per cycle), a simulation is conducted to evaluate the average packet latency. The efficiency of each type of routing is evaluated through latency-throughput curves.

Similar to other work in the literature, we assume that the packet latency spans the instant when the first flit of the packet is created, to the time when last it is ejected to the destination node, including the queuing time at the source. We also assume that the packets are consumed immediately once they reach their destination nodes. Each simulation is run for a warm-up period of 2000 cycles. Thereafter, performance data are collected after 20,000 packets are sent. Since

the network performance is greatly influenced by the traffic pattern, we applied three different traffic patterns, including three synthetic traffic patterns. In this set of experiments we consider three synthetic traffic patterns: uniform, transpose, and hot spot [15]. In the uniform traffic pattern, a core sends a packet to any other cores with equal probability. In the transpose traffic pattern, a core at (i, j) only send packets to the core at (5-j, 5-i). In the hot spot traffic pattern, the core at (3, 3) is designated as the hot spot, which receives 10% more traffic in addition to the regular uniform traffic. The network size during simulation is fixed to be 6\*6 tiles. All of the input ports have a FIFO size of 5 flits, with the congestion threshold set at 60% of the total FIFO capacity.

As shown in Fig. 5, XY routing performs better than both odd-even and BIOS routing algorithm under uniform traffic load. This result is consistent with other results reported in the other literature. The reason why XY performs best under uniform traffic is that it embodies global, long-term information about this traffic pattern. From a global, long-term point of view, the uniform traffic pattern starts with message traffic spread evenly across the mesh; later on the XY routing strategy maintains that evenness. On the other hand, the adaptive algorithms select the routing paths based on local, short-term information. The decision benefits only the packets in the immediate future, which tend to interfere with other packets. Thus, the evenness of uniform traffic is not necessarily maintained in the long run. However, for most of the applications in real world, each node will communicate with some nodes much more compared to others. XY routing has serious deficiency in dealing with such non-uniform traffic patterns because of its determinism. More precisely, XY routing blindly maintains the unevenness of the non uniform traffic, just as it maintains the evenness for the uniform traffic. In this spirit, Fig. 6 and Fig.7 show that XY routing is clearly outperformed by oddeven and BIOS routing algorithm. Taking the results using transpose traffic for instance (Fig.6), the network using XY saturates at an injection rate of 0.0175 packets/cycle. On the other hand, odd-even and BIOS routing algorithm are able to achieve a throughput of 0.0256 packets/cycle and 0.333 packets/cycle, respectively. This gives a 57% and 70% improvement in terms of sustainable throughput.

The effectiveness of BIOS routing algorithm is confirmed by the fact that it continuously outperforms odd-even in terms of sustainable throughput in all these experiments. In fact, for the same traffic pattern and the injection rate, BIOS routing algorithm achieves shorter average packet latency compared to odd-even throughout the experiments. Another interesting fact to observe is

that BIOS routing algorithm does keep the advantage of deterministic routing when network is not congested. As shown in Fig. 5 and Fig 6 and Fig 7, BIOS routing algorithm has the same average packet latency when network is not congested.



Fig.5. Routing performance under uniform traffic



Fig.6. Routing performance under transpose traffic



Fig.7. Routing performance under hot spot traffic

On the other hand, compared to BIOS routing algorithm, the average latency a packet experiences in odd-even is 18% higher compared to that in BIOS, when the network is lightly loaded. It appears that implementing a BIOS routing algorithm router requires an almost negligible additional cost compared to an odd-even router. To justify our claim,

we actually implemented all three versions of routers (XY, odd-even and BIOS routing algorithm) using a 0.16µm technology, with a clock rate of 333MHz. In our design, FIFOs are implemented using registers in order to achieve better performance/power efficiency. Their respective area (in gates) is shown in Table.1 for same FIFO capacity. In all the designs, each input port has a fixed link width of 32 bits. The flit size is set to be 32 bits as well. As shown in Table.1, the overhead of implementing the extra logic for BIOS routing algorithm is indeed negligible compared with odd-even implementation. For instance, for odd column routers with FIFO size of 8 flits, BIOS routing algorithm requires 27856 gates, while oddeven router requires 25,971 gates, the overhead is indeed. As shown in Table.1, the overhead compared to odd-even is below 7%.

Table 1. complexity of routers (gate count)

|            | 8        |       |       |
|------------|----------|-------|-------|
| Algorithm  | Odd-even | XY    | BIOS  |
| Gate count | 25971    | 24983 | 27856 |

### 5. Conclusion

In this paper, we present and evaluate a novel routing scheme called BIOS which combines the advantages of both deterministic and adaptive routing schemes. More precisely, we envision a BIOS routing technique which judiciously switches between deterministic and adaptive routing based on the network's congestion conditions. The simulation results show the effectiveness of BIOS by comparing it with purely deterministic and adaptive routing schemes under different traffic patterns. In this paper we improved both output selection and input selection. This paper has shown the importance of both input selection and output selection in routing efficiency. By granting busier input channel higher priority to access the output channel, BIOS algorithm keeps the traffic in busy paths flowing, therefore removes possible network congestion. Moreover, a prototype router based on the BIOS idea has been designed and evaluated. Compared to purely adaptive routers, the overhead of implementing BIOS is negligible (less than 7%), while the performance is consistently better.

#### References

[1] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J.Oberg, et al, "A network on chip architecture and design methodology," ISVLSI, pp. 117-24, USA, 2002. [2] L. Benini and G. De Micheli, "Networks on chips: A new SoC paradigm," Computer, vol. 35, pp. 70-78, 2002.

- [3] J. Henkel, W. Wolf, and S. Chakradhar, "On-chip networks: A scalable, communication-centric embedded system design paradigm," VLSI Design, pp. 845-851, India, 2004.
- [4] D. Bertozzi and L. Benini, "Xpipes: A network-on-chip architecture for gigascale systems-on-chip," IEEE Circuits and Systems Magazine, vol. 4, pp. 18-31, 2004.
- [5] W. J. Dally and B. Towles, "Route packets, not wires: On-chip interconnection networks," DAC, pp. 684-689, USA, 2001.
- [6] J. Duato, S. Yalamanchili, L. Ni, *Interconnection Networks: an Engineering Approach*. IEEE Comp.Society Press, 1997.
- [7] J. Hu and R. Marculescu, "DyAD Smart routing for networks-on-chip," DAC, pp. 260-263, USA, 2004 [8] G.-M. Chiu, "The odd-even turn model for adaptive
- [8] G.-M. Chiu, "The odd-even turn model for adaptive routing," IEEE Transactions on Parallel and Distributed Systems, vol. 11, pp. 729-38, 2000.
- [9] L. M. Ni and P. K. McKinley, "A survey of wormhole routing techniques in direct networks," Computer, vol. 26, pp. 62-76, 1993.
- [10] T. T. Ye, L. Benini, and G. De Micheli, "Packetization and routing analysis of on-chip multiprocessor networks," Journal of Systems Architecture, vol. 50, pp. 81-104, 2004. [11] C. A. Zeferino, M. E. Kreutz, and A. A. Susin, "RASoC: A router soft-core for Networks-on-Chip," Designers Forum DATE, pp. 198-203, France, 2004.
- [12] N. Kavaldjiev, G. J. M. Smit, and P. G. Jansen, "A virtual channel router for on-chip networks," IEEE International SOC Conference, pp. 289-93, USA, 2004.
- [13] E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. Van Meerbergen, P. Wielage, and E. Waterlander, "Trade-offs in the design of a router with both guaranteed and best-effort services for networks on chip," IEE Proceedings: Computers and Digital Techniques, vol. 150, pp. 294-302, 2003.
- [14] E. Nilsson, M. Millberg, J. Oberg, and A. Jantsch, "Load distribution with the proximity congestion awareness in a network on chip," DATE, pp. 1126-7, Germany, 2003. [15] G. Varatkar and R. Marculescu. On-chip traf\_c modeling and synthesis forMPEG-2 video application.

IEEE Tran. on VLSI, 12(1), Jan. 2004.