Parallelization techniques

Parallelization techniques Automatic Program Parallelization, means automatic program changing. An original serial program is transformed in some way, that better uses the concurrent features provided by the hardware. This process is usually created at the following three steps: 1. The first step of this algorithm consists on testing the original program to determine whether or not parallelization is possible. Not all the programs can be automatically changed. Some tests must be done, to ensure that a concrete change can be performed on the code of the program, to enhance the parallel potentiality. This part deals with Data Dependence formulation and Data Dependence testing. 1 1

Parallelization techniques 2. The second step, consists of deciding, which one of the possible transformations of the code, is better to be performed. A lot of techniques exist, and every one of them is useful under certain conditions, under certain architectures, or under certain target languages, on which our application program will be interpreted or compiled and linked. This second step, deals with Transformation techniques. 3. The third step, naturally, involve the generation of the parallel code. Depending on the memory model of the machine, on the nature of the problem and some other restrictions, the final code for the parallel program is created. 2 2

Informal data dependence definition S1: A = 1.0 ; S2: B = A + 3.14 S3: A = 1/3 * ( C - D )... S4: A = ( B * 3.8 ) / 2.71 Let s consider statement S2. The value of A used here must be the value assigned to A by S1. An interchange between S1 and S2 would result in S2 accessing the old value of A. That kind of dependence is called true dependence from S1 to S2 and it is also know as a flow dependence between S1 and S2. Now, consider statements S2 and S3. Here, we can also find that an interchange of S2 and S3, would result on S2 accessing a wrong value for A ( not the value assigned in S1 but in S3 ). It is called anti dependence from S2 to S3. Finally, the relation between S3 and S4 is called output dependence. As you can see here, the textual order must be followed, if we want to guarantee that the value for A used after S4 is the right one. 3 3

Practical data dependence definition Computing the exact dependence relations can be very time consuming or even impossible. To solve that problem we can approximate the data dependence relations using IN/OUT sets and execution order. The IN set (input set) is a set of input items of statement whose values are fetched in the statement. The OUT set (output set) is a set of output items of statement, whose values are changed in the statement. Execution order is denoted by the symbol " ", Si Sj means that statement Si can be executed before statement Sj. Then we can defined Flow dependence: If S1 f S2 then S1 S2 and OUT(S1) IN(S2) 0. Anti dependence: If S1 fa S2 then S1 S2 and IN(S1) OUT(S2) 0. Output dependence:if S1 fo S2 then S1 S2 and OUT(S1) OUT(S2) 0. 4 4

Control Dependences A control dependence determines the ordering of an instruction, i, with respect to a branch instruction so that the instruction i is executed in correct program order and only when it should be. Let s consider the following code segment if p1 { S1; } if p2 { S2; } S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1 5 5

Limitation of dependence analysis There are two limitations that affects our ability to do accurate dependence analysis for large programs: Limitation in the analysis algorithms analysis for pointers is essentially impossible for programs that uses pointers in arbitrary fashion, by doing arithmetic on pointers (limited by the lack of applicability), Limitation in analysis behavior across procedure boundaries interprocedural analysis (makes analysis much more difficult) 6 6

Usuwanie zależności wyjściowej Zależność wyjściową usuwa się podobnie jak antyzależność, stosując zamianę nazw zmiennych. Zależność wyjściowa występuje bowiem, gdy dwie kolejne instrukcje zmieniają wartość tej samej zmiennej, a dla wyliczenia drugiej z nich poprzednia wartość tej zmiennej nie jest istotna. a = b + c; a = b + c; a = x + y; aa = x + y; 7 7

Usuwanie antyzależności Antyzależność występuje w programie, jeżeli pewna instrukcja zmienia wartość zmiennej, nie wykorzystując jej poprzedniej wartości. Usunięcie antyzależności polega na nadaniu jej nowej nazwy i wykorzystaniu w ten sposób nowego obszaru pamięci. Poniższy przykład prezentuje sposób antyzależności względem zmiennej a. usunięcia b = a + c; b = a + c; a = x + y; aa = x + y;...... d = a + z; d = aa + z; Napotykając dalsze odwołania do zmiennej a, należy je wszystkie odpowiednio zmodyfikować, 8 8

Usuwanie zależności właściwej Zależność właściwa nie daje się usunąć całkowicie, dlatego stosowane transformacje programu mają na celu zmniejszenie liczby wystąpień takich zależności. W wielu przypadkach usuwanie zależności właściwej sprowadza się do redukcji wysokości grafu zależności (operacja ta jest znana pod nazwą redukcji wysokości drzewa). Graf zależności to graf skierowany, którego węzły reprezentują instrukcje programu, a łuki występujące pomiędzy nimi zależności. Umożliwia to częściowe zrównoleglenie obliczeń. Najczęściej stosowanymi w tym celu przekształceniami są: wprowadzenie dodatkowych zmiennych, zmiana nazw zmiennych oraz zamiana kolejności obliczeń. 9 9

Usuwanie zależności właściwej - przykład a = t[1]; b = t[1] + t[2]; b = a + t[2]; c = t[3] + t[4]; c = b + t[3]; d = b + c; d = c + t[4]; a= b= c= b= c= d= d= Kolejne instrukcje programu wejściowego są od siebie zależne. Dwie pierwsze instrukcje zmodyfikowanego programu mogą natomiast zostać wykonane równolegle, ponieważ nie istnieje między nimi zależność została usunięta. 10 10

Eliminacja zależności for (i=1;i<=100;i=i+1) for (i=1;i<=100;i=i+1) { y[i]=x[i] / c; { t[i] =x[i] / c; wyjściowa x[i] = x[i] = c; x1[i] = x[i] = c; z[i] = y[i] + c; z[i] = t[i] + c; antyzależność y[i] = c y[i]; } y[i] = c t[i]; } 11 11

Redukcja wysokości drzewa Rozważmy następujący przykład: SUM := 0; DO 10 J := 1,8; 10: SUM := SUM + A(J); SUM A 8 A 7 A 6 + A 5 + + A 4 + + + + A 3 A A A A A A 1 2 3 4 5 6 A A 7 8 A 2 Reduced Tree 0 A 1 Calculation Tree 12 12

Recurrences I Consider the expression for a vector inner product: p = A * B where A = (a 1,a 2,...a n ), B = (b 1,b 2,...b n ) This can be converted to the linear recurrence: p 0 = 0 p i = p i-1 + a i b i 0 < i < N +1 The general form of an "m-th" order linear recurrence of size "n" (know as R<n,m>) is: x i = 0 i 1 x i ci Ai, j * X j if 0 < i <= n j i m This form can be easy expanded and put into matrix form X = A X + C, where: X = (X1,X2,...Xn), C = (C1,C2,...Cn) 0 0 0... 0 0 0 a 21 0 0... 0 0 0 A = a 31 a 32 0... 0 0 0 0 0 a n,n-m... a n,m-1 0 13 13

Recurrences II Consider for example the relatively simple recurrence system R<4,3> 0 0 0 0 a 21 0 0 0 A = a 31 a 32 0 0 a 41 a 42 a 43 0 The straight forward method to convert this to an expression is to simply carry out the matrix arithmetic. X1 = C1 X2 = C2 + a21x1 X3 = C3 + a31x1 + a32x2 X4 = C4 + a41x1 + a42x 2+ a43x3 14 14

Recurrences III By back substitution for Xi these becomes X1 = C1 X2 = C2 + a21c1 X3 = C3 + a31c1 + a32c2 + a32a21c3 X4 = C4 + a41c1 + a42c2 + a42a21c1+ a43c3 + + a43a31c1 + a43a32c2 + a43a32a21c1 EXERCISE To calculate the recurrence some 14 operations are required but when we use four processors it can be calculated in only? steps 15 15

Detecting Parallelism The first step of automatic parallelization consists on performing a dependence test to detect potential for parallelization. There exist two different kinds of such tests. The difference is due to the result they give. When the algorithm either detects Data Dependence or detects data independence, then we say that the result is definite. The tests that give us a definite result are the exact tests. If on the contrary, the algorithm can neither determine Data Dependence nor Data Independence, the result is said to be indefinite. This second kind of algorithms are known as inexact tests. 16 16

Loop-level parallelism I The analysis of loop-level parallelism focuses on determining whether data accesses in later iterations are dependent on data values produced in earlier iterations such a dependence is called a loop-carried dependence. Let s consider the following loop: for (I=1000; I>0; I=I-1) x[i] = x[i] + s; There is dependence between two uses of x[i] it is not loop carried. There is loop carried dependence between successive uses of I in different iterations it involves an induction variable. 17 17

Loop-level parallelism II Consider a loop like this one for(i=1;i<=100;i=i+1) { A[I+1] = A[i] + C[I] /* S1 */ B[I+1] = B[I] + A[I+1]; } /* S2 */ Assume that A,B,C are distinct, nonoverlapping arrays. There are two different dependences in above example S1 uses a value computed by S1 in an earlier iteration (iteration I computes A[I+1], which is read in iteration I+1), the same for S2 loop carried, S2 uses the value A[I+1] computed by S1 in the same iteration - data dependence 18 18

Identyfikacja zależności Czy istnieje zależność pomiędzy dwoma odwołaniami do macierzy wewnątrz pętli? Załóżmy, że indeksy macierzy są afiniczne (jedno wymiarowa macierz jest afiniczna jeśli możemy się do niej odwołać wykorzystując wyrażenie a*i+b, gdzie a i b są stałymi oraz i jest indeksem pętli). Wynika stąd, że odpowiedź na nasze pytanie polega na sprawdzeniu czy dwie afiniczne funkcje mogą przyjmować tą sama wartość dla różnych indeksów wewnątrz zakresu pętli. 19 19

Identyfikacja zależności Rozważmy następujący prosty przykład Załóżmy, że zapisujemy pewną wartość w elemencie macierzy o indeksie a*i+b oraz odczytujemy z tej samej macierzy z elementu o indeksie c*i+d, gdzie i jest indeksem pętli z zakresu od m do n. Zależność będzie istniała jeśli: Istnieją dwa indeksy pętli j i k w jej zakresie, tzn. m<= j<= n, m<=k<= n, takie, że najpierw zapisywany jest element macierzy o indeksie a*j+b a następnie odczytywany jest z macierzy element o indeksie c*k+d i a*j+b = c*k+d. 20 20

Identyfikacja zależności Jak możemy rozwiązać równanie a*j+b = c*k+d? W ogólnym przypadku nie można rozwiązać tego równania ponieważ w czasie kompilacji wartości a,b,c i d nie są znane W naszym przypadku isnieje jednak test umożliwiający sprawdzenie czy zależnośc instnieje czy nie. Jest to tak zwany GCD test test największego wspólnego podzielnika Test ten bazuje na obserwacji, że tzw. loop-carried zależność istnieje jeśli GCD(c,a) dzieli (d-b) (to znaczy, że jeśli dzieli się jedną liczbę całkowitą przez drugą nie mamy reszty) 21 21

Identyfikacja zależności Zastosujmy test GCD to następującego prostego przykładu: for (I=1; I<=100; I=I+1) { X[2*I+3] = x[2*i]*5.0; } Wartości wynoszą odpowiednio a = 2, b=3, c = 2 i d = 0 Stąd GCD(2,2) = 2, i d-b = -3 2 nie dzieli 3 Więc nie ma zależności 22 22

An example Let s consider the following example where, GCD Test is performed to know if there exist a dependence between two references to variable A in S1 and S2. for(i=.) for(j= ) { A(2*i + 2*j +101 =.. = A(2*i + 2 * j) } The Diophantine Equation that we find out from the example is: 2i 1 2 j 1 2i 2 2 j 2 101 0 The GCD of its coefficients is 2, and 2 does not divide 101. The equation does not have solutions and there is no Data Dependence between the two references. GCD Test considers not only the iteration space defined by the indexes of the loop, but all the linear space defined bye each one of the equations. In some examples, GCD Test may find a dependence that is out of the iteration space of the loops. 23 23

Diophantine Analysis & Exact Test I A dependence relation can be considered as an special case of equation. We can match this equation with a linear Diophantine equation, because there exist a method to know if there are solutions to this kind of equation. A linear Diophantine equation is of the form: n a. x i 1 i i c where n 1, c, a i, are integers for all i, not all a i are equal to 0, x i are integer variables. 24 24

Diophantine Analysis & Exact Test II A Diophantine Equation has a solution GCD ( a1,...,an) c where: GCD refers to the greatest common divisor of a i s represents that the number at the right divides the number at the left. Solutions to a two-dimensional Diophantine equations are specially useful for finding dependence in 2-nested loops, where each axis in the solution space corresponds to the iterations in a 2-dimensional iteration space. 25 25

An example I Let s consider the following example: for (I = 1;I<=101;I++) { A ( 2 * I ) =...... = A ( 3 * I + 198 ) } A 2-statements loop is being tested for Data Dependence. We would like to know if there exist any value of X and Y where the two references to variable A take the same value, inside the iteration space imposed by the index limits of the loop. From mathematical point of view we want to solve the equation: 2x = 3y + 198 where: 1 X 101 & 1 Y 101. 26 26

An example II General solutions of the equation ax + by = c are given by: x bt GCD( a, b) where t = 1,2,3 y at GCD( a, b) where represents multiples of c 27 27

An example III In our case, the general solution is given by: x y 3t 396 1 t t 3 396 101 131 99 2t 198 1 2t 198 101 98 t 49 In the anterior restrictions -99 t contradicts t -98, and the Diophantine Equation does not have a solution that satisfies the given constraints. In terms of Data Dependence Analysis, this means that the pair of references are not Data Dependent. If a Diophantine equation has two variables, its general solution is easy to be found, because the solution will only have 1 parameter. So this Exact Test can be performed on a pair of single-dimensional array references. 28 28