ECTS credits ECTS credits: 4.5
ECTS Hours Rules/Memories Student's work ECTS: 74.2 Hours of tutorials: 2.25 Expository Class: 18 Interactive Classroom: 18 Total: 112.45
Use languages English
Type: Ordinary Degree Subject RD 1393/2007 - 822/2021
Center Higher Technical Engineering School
Call: First Semester
Teaching: Sin Docencia (En Extinción)
Enrolment: No Matriculable (Sólo Alumnado Repetidor)
The main objetive of this course is to let students know the architecture of parallel systems, ranging from supercomputers to servers, clusters and including also multicore processors. A very important aspect is to show the conceptual and technological levels that allow to understant the architecture of current and future systems. Other objectives are i) to show the interface with the software, and ii) some fundamentals about performance and efficiency of this type of systems.
PROGRAM
For the scheduling of the clasess we have estimated 9 hours of classes for theory, 10 for resolution and discussion of problems and exercises and 18 hours of classes for labs.
THEORY
The leading four parts describe different aspects of parallel systems hardware, so that all the parts are equally important. The last part deals with performance and efficiency of parallel systems, that is not as important for the objectives of the course.
1. Technology Context
This part justifies the need of parallel and distributed systems, and shows the different levels of study for this kind of systems. We contextualize the different concepts we study in this course by means of a brief review of the history of high-perfomance computing systems, and by the data provided by the Top500 list (a list with the 500 most powerful supercomputers in the world). We will introduce a simple model to predict the evolution of the computational nodes in the multicore era, and end this part by reviewing some popular microprocessors for high-performance computing.
- Complex problem solving with multiprocessor systems: examples.
- Brief history of the high-performance computing systems: supercomputers.
- Evolution of parallel systems: Top500 list.
- Architecture of the computation nodes: evolution and perspectives.
- Commercial microprocessors for high-performance computing.
Class distribution: 2 hours.
2. Inteconnection Topologies for Multiprocessor Systems
In this part we begin the study at the network level for the interconnection of the computational nodes. Specifically, we deal with network topology for high performance systems and its figures of merit. Network topology is a fundamental issue in this kind of systems.
- Bus
- Switch-based networks
- Examples of switch based networks with unnidirectional based interconnections.
- Examples of switch based networks with bidirectional based interconnections and injection/extrantion nodes.
- Evaluation criteria for comparison of topologies.
- Technological considerations of the interconnection channels.
Class distribution: 2 hours.
3. Message Passing Architectures
This part completes the study of the interconnection network of the system, by dealing with routing algorithms and flow control, the deadlock problem and its avoidance, and the implementation of all of these schemes in a state of the art switch for a high performance network. Another important issue studied here is the nework interface with the computational node to obtain high performance communications. We complete the view of message passing multiprocessor system by showing the fundamentals of its programming model and an example of an API for programming (MPI). We show examples of cluster networks by means of a proyect.
- Programming model.
- Routing algorithms: determinist, oblivious and adaptive.
- Routing implementation: path tables and algorithmic.
- Flow control: without buffers (discard packets, switched circuit), with packet buffers (store and forward, cut though), with flit buffers (wormhole, virtual channels).
- Architecture of a switch with flow control by virtual channels.
- Deadlock: avoidance in deterministic and adaptive routing. Duato's theorem.
- Infinite latency messages.
Class distribution: 2 hours.
4. Shared Memory Multiprocessors
In this part we study the hardware and software elements that must be added to multiprocessor systems to have a single map of memory (shared memory among all processors). We deal with deeper implementation details of the cache coherence algorithms (already introduced in the Computer Architecture course), and the issues of syncronization. We also show the cache consistency model implemented in OpenMP.
- Review of cache coherence algorithms: the MESI protocol.
- Implementation details of the flavors MESIF and MOESI.
- NUMA Systems with point-to-point interconnect.
- Examples of industrial implementations.
- Syncronization
- Coherence in heterogeneous systems and mobiles.
Class distribution: 2 hours.
5. Additional Relevant Concepts on Multiprocessor Systems
This part introduces the theory foundations to study the performance and efficiency of a parallel system. These issues might be useful to take design decisions when configuring large parallel computational systems.
- Speedup
- Efficiency
- Amdahl's law. Gustaffson-Barsis law.
- Isoefficiency. Models to predict speedup-vs-number of processors graphs.
Class distribution: 1 hour.
LABS
Lab Exercise 1: design of a set of benchmark programs to evaluate the performance of a supercomputer network (Finisterrae Supercomputer installed at CESGA) and a network of worstations (the systems where students actually perform the labs). We will take as a reference the set of MPI benchmarks by Intel . The programs should be written in C using the MPI library MPICH.
Lab Exercise 2: Simulation of interconnection networks for multiprocessors. Different configurations will be simulated using a free network simulator, by trying different network topologies, routing algorithms and control flow parameters. Studends should obtain the bandwidth and average latency of the packets. They must reach their own conclusions from the experiments performed, trying to obtain the most structured information as possible from the data in order to characterize this type of networks.
Hours of Lab classes: Lab 1: 12 hours . Lab 2: 6 hours.
Projects (teams of two/three students): design of a cluster for parallel computing: analysis of computational performance/cost/footprint/power and software, with the aim of an specific design objective (high performance under any cost, tradeoff between cost and performance, tradeoff cost-power, etc). Internet will be used to get the data for the different components. One of the networks to analyze must be a high performance low latency network.
Basic:
1- E. Antelo, “Arquitectura de los Sistemas Paralelos”. 2019. An electronic version will be available to the students.
This book was written by the professor for this course, and it will be the main source of contents.
2- www.top500.org.
Reference web site with abundant information about parallel systems. This site keeps a list of the fastest supercomputers of the moment. Basic reference with information about supercomputation and parallel systems.
Complementary:
3- J. Dally and B.P. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004.
This is the fundamental reference for the stuy of high performance interconnection networks. This book complements and provide much more contents than reference 1 for the parts 2 and 3.
4- Patterson y Hennessy, "Computer Architecture. A Quantitative Approach", 6ª Edition, Morgan Kaufmann, 2017.
Clasical reference for the computer architecture course, this book is useful an complements to reference 1 for the contents of parts 1, 4 and 5. The most interesting part is chapter 4 that deals with multiprocessors and multithreading.
5- H. Él-Rewini et al. "Advanced Computer Architecture and Parallel Processing", Wiley 2005.
This book covers some of the aspects of parts 2, 3, 4 and 5, and it can be of interest to complement the contents of reference 1.
To contribute to achieve the competences described in the document "Grao de Enxeñería Informática na USC (CG4, CG6, CG9, CG11, TR1, TR2, TR3, FB5, RI1, RI9, RI14, TI2, TI5).
Specifically:
CGA. Ability to define, evaluate and select software and hardware platforms for the development and implementation of systems, services and computer applications, according to the knowledge acquired according to the "Acuerdo del Consejo de Universidades del 03/03/2009 para los títulos oficiales en el ámbito de la Ingeniería Técnica Informática".
Specifically:
CG6. Ability to design and develop centralized or distributed computer systems or architectures integrating hardware, software and networks according to the knowledge contained in section 5 of the "Acuerdo del Consejo de Universidades del 03/03/2009 para los títulos oficiales en el ámbito de la Ingeniería Técnica Informática".
CG9. Ability to solve problems with initiative, decision making, autonomy and creativity. Ability to communicate and transmit knowledge and skills of the profession of Computer Engineer
CG11. Ability to analyze and evaluate the social and environmental impact of technical solutions, which includes the ethical and professional responsibility of the IT Engineer.
TR1.Instrumental: ability to analyze and Synthesize. Organization and planning. Oral and written in Galician, Spanish and English. Ability to manage information. Problem solving. Decision making.
TR2. Personal: Teamwork. Working in a multidisciplinary and multilingual team. Skills in interpersonal relationships. Critical thinking. Ethical commitment.
TR3. Systemic: Autonomous learning. Adapting to new situations. Creativity. Initiative and entrepreneurship. Motivation for quality. Sensitivity to environmental issues.
FB 5. The knowledge of the structure, organization, operation and interconnection of the computer systems, the fundamentals of its programming, and its application to the resolution of problems specific to Engineering.
IR1. Ability to design, develop, select and evaluate applications and systems, ensuring reliability, safety and quality, in accordance with ethical principles and laws and regulations.
RI9. Ability to learn, understand and evaluate the structure and architecture of computers, as well as the basic components that compose it.
RI14. Knowledge and application of the fundamental principles and basic techniques of parallel, concurrent, distributed and real-time programming.
TI2. Ability to select, design, deploy, integrate, evaluate, build, manage, operate and maintain hardware, software and network technologies, within the parameters of cost and quality.
TI5. Ability to select, deploy, integrate and manage information systems to meet the needs of the organization, with identified cost and quality criteria.
Competences associated to the Computer Engineering module:
- Knowledge of the architecture of parallel and distributed systems, from both, the hardware and implementation and the programming point of views.
The objective of our teaching methodology is to have a high degree of student participation during the development of the course, looking for a high level of interaction with the professor and the rest of peers. To reach this objective we propose the following type of classes.
- Regular classes: in these classes the professor introduces the main concepts of the corresponding part of the course, providing guidance to the student in order to facilitate its understanding. Depending on the part, one or two hours will be devoted (see the contents section). Documentation: reference 1 of the bilbliography will provide all the necessary material for each part. The students will know the calendar of regular classes so that they may perform a first reading of the material.
- Classes for Test discussion: each of the parts requires to answer a test of short questions, with the aim of motivating the student to work on this part. The students should bring the test solved (fully or partially) before the beginning of this kind of class by means of any available media (paper or electronic media). During this kind of classes, the most relevant questions of the test will be discussed, expecting a high degree of studen participation. The test will be available electronically at the USC virtual (including the extra material required, such as an article, etc).
- Classes for problem discussion: for each part several problems will be proposed. The students should bring the problems solved (fully or partially) before the beginning of these type of class. The most relevant problems will be discussed in class, expecting a high dregree of participation of the students. The list of problems will be available electronically at the USC virtual (including any extra material required, such as an article, etc).
- Lab classes: the students will carry out the proposed labs, in the number of sessions indicated. They should work autonomously and individually, looking for interaction with the professor and other peers for solving specific doubts and questions, and for result verification. The very first sessions will be devoted to an introduction to MPI, which is necessary for lab exercise 1. Both, brief explanations of the most important MPI routings, and the realization of simple exercises (running them on the supercomputer or in the network of workstations of the lab) will be combined. The material for this part of the labs, available electronically, is a presentation, with an explanation of the MPI routines, and a tutorial, where the relevant practial aspects for running the programs is explained. For lab exercise 1, the students will have available, electronically, the description of the exercise, estimated time for its realization and evaluation criteria. Moreover, the additional material required will also be electronically available: a document from Intel describing the methodology used in its bencharmks. For lab exercise 1, two lab hours will be devoted to a brief presentation and discussion of the results. For lab exercise 2, the description of the exercise, the estimated time for its realization, and the evaluation criteria will also be available electronically. Furthermore, the user manual for the network simulator will also be available electronically.
Since we expect a highly student active participation, they should know with precision the calendar for each type of class. It is mandatory that the student has enough time between studying a part and the classes where they should present the test results and problems. As a reference, a minimum of two weeks will be available between the regular classes and the classes for test discussion, and three weeks for the case of classes for problems dicussion.
The office ours ("tutorias" in spanish) may be held in the professor's office (schedule in agreement with the students or determined by the School), through e-mail or through the Virtual Campus.
Development of competences:
CG4 and CG6: they fully learn these competences in all teaching activities, in the specific part of what is the hardware architecture of parallel systems, with emphasis on high performance interconnection networks.
CG9: The students work this competence through the realization of proposed problems, later they are put in common, sharing and communicating the results. The project also allows to work extensively this competence, as the approach is quite open, with many possible solutions. The presentation of the project to the rest of colleagues and to the teacher allows to work the part of communication and transmission of knowledge.
CG11: in the classes of theory we make special emphasis on the consumption / power impact of the large parallel computation systems. We also talk about applications of this type of large systems, which sometimes lead to the development of applications of questionable social utility, and others in scientific advances that create social and economic wealth.
TR1:
Capacity to analyze and synthesize: the realization of problems and questions to work the capacity for analysis. The project mainly works on the capacity of synthesis. In the first practice, both analysis and synthesis capabilities are worked out, since students have to design a set of programs to perform various experiments on a high-performance interconnect network, and then interpret and analyze the results. In the second practice, which consists in the use of a simulator of high performance networks, the analysis of the effect on the performance of the network is based on the variation of the different parameters.
Ability of organization and planning: the tasks that are presented have strict deadlines, leaving the student to organize to achieve the required planning, taking into account that they have many other plans for other subjects.
Oral and written communication: students make short presentations of the latest high performance microprocessors coming to market, presenting problems and presenting the project. For the project they have to make a detailed report, for which ideas are exchanged to improve the ability to communicate results in writing. The subject is developed in English so that the students have the opportunity to be active in this language in all facets of their communication.
Information management capacity: a lot of information needs to be managed in the project, since they must obtain all the specifications for the design of a high-performance computing cluster from the Internet, and they must inquire in the information given by the manufacturers, who is not always simple to handle.
Problem solving: Students must solve problems associated with the objectives of the subject. The project should also apply this competence as part of the design process.
Decision making: the project is very open and students must make justified decisions. The discussion with the teacher during the realization process allows to work positively this competence.
TR2:
Teamwork, skills in interpersonal relationships: the project can be carried out as a team. The teacher gives some guidelines to ensure that the teams are efficient in the development of the work.
Critical reasoning: in the presentation of the project students are asked to critique each project presented, highlighting the strengths and weaknesses and with total objectivity. The teacher will moderate the process so that students improve their critical thinking skills.
Ethical Commitment: Emphasis will be placed on the issue of plagiarism.
TR3:
Autonomous learning: theoretical classes only highlight the most important aspects of the subject, but the students are obliged to deepen autonomously to carry out the problems and the project. Practices also require ind ependent learning, for example, to design the set of proof programs for practice 1, students have to study on their own the specification of the methodology provided by a manufacturer.
Adapting to new situations. In some exercises the student is asked to investigate future technological trends and adapt the acquired knowledge to these hypothetical situations.
Creativity, motivation for quality and sensitivity for environmental issues: the project is the best tool to exploit this facet. Students must be creative in the solutions provided, which in turn must be of quality, which is reflected in the evaluation criteria. Students are asked to consider environmental aspects in the design of the computer cluster.
FB5, RI1, RI9: these competences are worked on all aspects of teaching in this area, regarding high performance parallel computing systems.
RI14: the architecture of parallel systems has a fundamental impact on the model of parallel and distributed programming. Special emphasis is placed on theoretical classes in the repercussions of the different aspects of architecture in the programming of such systems and their influence on performance. In practice 1, the test programs allow to evaluate the communication overhead costs in high performance parallel systems, and show how this information is fundamental for the programmer.
TI2, TI5: all teaching activities covers these competencies, especially the realization of the project and its correction process.
ENXCOMP: "Competences associated to the module of Computer Engineering within the Degree:
Knowledge of the architecture of parallel and distributed systems both from the hardware and implementation point of view and from the point of view of their programming. "
This competence is worked in an intrinsic way in all the teaching aspects of the subject.
Regular:
Contributions to the final grade and evaluation criteria (for a maximun of 10 points):
- Project: the evaluation creteria are the degree of project elaboration (compared options, selection criteria), achieved objectives and presentation of results, in proportion to the number of members of the team (3 maximum). Contribution: 4 points. It allows the evaluation of the following competences: CG4, CG6, CG9, TR1, TR2, TR3, FB5, RI1, RI9, TI2, TI5, ENXCOMP.
- Lab exercise 1: the evaluation criteria will take into account the number of benchmarks performed, the degree of elaboration and the conclusions of the experiments and presentation of results. Contribution: 2 points. It allows the evaluation of the following competences: CG4,CG6, TR1, TR2, TR3, FB5, RI1, RI9, RI14, TI2, TI5, ENXCOMP.
- Lab exercise 2: the evaluation criteria will take into account the number of simulations performed, the quality and elaboration of the conclusions of the different simulations and the presentation of results. Contribution: 2 points. It allows the evaluation of the following competences: CG4, CG6, TR1, TR2, TR3, FB5, RI1, RI9, TI2, TI5, ENXCOMP.
- Short assignments and problems: the evaluation criteria will take into account the number of correct questions performed of the test, and the number of problems solved correctly. Contribution: 2 points. It allows the evaluation of the following competences: CG4, CG6, CG9, CG11, TR1, TR2, TR3, FB5, RI1, RI9, RI14, TI2, TI5, ENXCOMP.
To pass the course an average of 5 or greater is necessary.
Conditions for the qualification as "no presentado" (as you were not actually enrolled in the course): not to present any lab and project.
Students that have been enrolled in this course in previous years do not keep the previously achieved gradings in the different parts.
Extraordinary evaluation:
The same evaluation criteria as in the regular evaluation.
Conditions for the qualification as "no presentado" (as you were not actually enrolled in the course): not to present any lab and project.
The interation during office hours, the resolution of exercises and practical cases in the class, and the labs will allow the professor to know informally the dregree of assimilation of the contents that students achieve during the development of the course.
For 4.5 ECTS credits, the personal work of a student should be about 67.5 hours, distributed in the following way:
- Autonomous work: 22 hours, devoted to the assimilation of the theoretical contents of the course.
- Writing exercises, conclusions and other assigments: about 20 hours, basically devoted to solve problems and exercises and to the project.
- Programming/experimentation: 10 hours, devoted to the resolution of labs and preparation for presentation of results.
- Avaliation activities: 15 hours, devoted to activities related with the avaliation of labs, exercises and project.
The scheduling of tasks will try to achieve a uniform distribution of workload during the course. However, this distribution may vary at some hot spots such us when presenting results of the labs or problems and exercises.
It is highly recommended that the student has passed (or at least coursed) the courses Computer Architecture and Computer Engineering, and knowledge for programming in C.
The realization of the proposed exercises (test and problems) in parallel to the develpment of the different parts of the course is decisive for a good assimilation of the different concepts studied.
The course materials and the electronic communications with students will be carried out through the framework of the Virtual Campus of the USC.
The course Multicore and Multiprocessor Programming is the perfect complement for this course.
English will be used for some topics and class notes.