Skip to main content

CACDS in the classroom

Classroom Presentations on High Performance Computing and Data Science

The computational scientists at CACDS are available as guest presenters for classes with a computational component. Some presentations are available as lectures, while others involve hands-on workshops in which students work through some programming tasks.

While presentations have been prepared in advance, a guest lecturer can usually make reasonable  adjustments to the material to better fit the background and needs of a particular class. Generally,
the presentation will include slides and a number of sample programs, which can be inspected by the class instructor, and made available to the students.

Most lectures can fit into one class period, that is, either a 50 minute or 75 minute time slot. Thus, they are fairly easy to schedule as a regular class session in the usual classroom. Hands-on workshops usually require more time (perhaps a 2 or 4 hour slot), perhaps a different meeting place, and often two CACDS staff members, to deal with the inevitable individual problems and questions. For workshops, students should have their laptop computers available; or access to CACDS computers could also be made available.

The following lectures and workshops are offered, subject to the availability of lecturers. Additional topics can also be offered based on the wide range of expertise of CACDS computational team. To inquire about arranging a classroom lecture or workshop, contact Amit Amritkar

If your class has a substantial visualization component, then you might be interested in the CACDS visulaization lectures, in which the instructor presents a customized workshop on visualization, tailored to the level, interests, and needs of the class as listed below.

Data Science with Python

roughly 4 hour workshop

This class will cover the basics of using Python Pandas for your Data Science problems. You will also learn how to parallelize Python code using the ‘multiprocessing’ module. We will use Jupyter notebooks for the hands-on examples. Python version 2.7.12 for Anaconda (version 2.3.0) is recommended as an installation to follow along in class. Topics include file reads; creating, accessing and modifying dataframes; formatting and dealing with missing data; Filter/Apply/Map, Group and Query for data selection and manipulation; the statistical functions in Pandas.

CUDA Test Drive

one class period

CUDA is an extension of C that allows programmers to take advantage of the enormous parallel potential of graphics processing units (GPU’s). In this lecture, several simple programming tasks are presented as C programs. Each program is then transformed into a CUDA program and executed on an CACDS cluster computer. Explanations are provided for the code transformations and the necessary batch files. Some suggestions are made about the underlying complexity of the CUDA thread and block model. Students should be familiar with C.

Debugging and Profiling with GDB and GPROF

one class period

A programmer constantly struggles to ensure that a program is correct and efficient. When a program misbehaves, a programmer’s simplest tools involve inserting print statements and modifying bits of the code. But often, the cause or location of the problem eludes such an approach. The GNU compiler family includes GDB, a powerful tool that allows the user to diagnose run-time errors, to step through program execution and inspect or modify data, and GPROF, a program that can monitor the execution of a program and identify hot spots (frequently executed areas of code) and the amount of time spent in various functions. Demonstrations will be made of how these utilities work and what they tell the programmer.

Faster Code for Free: Accelerated Linear Algebra Libraries

one class period

Many programs rely on linear algebra operations such as matrix multiplication or factorization. Modern libraries such as MKL and ATLAS are optimized for given CPU architectures, enabling user programs that are faster to write and execute. This class gives an overview of these libraries, and a demonstration of them in action. Students should be familiar with C, and with common linear algebra procedures.

Introduction to CUDA Programming

one class period

GPUs have become a powerful and popular accelerator in large scale computing. Nvidia’s CUDA programming language allows developers to take advantage of this massively parallel computing hardware. Topics covered include a general discussion of SIMD (Single Instruciton Multiple Data) devices, overview of CUDA architecture and capabilities, a basic CUDA program example, review of other higher level libraries and languages that use CUDA. GPU’s are available on the UH clusters.

Hands-on GPU-R

2 hour workshop

This workshop introduces the ways in which graphics processing units (GPUs) can accelerate computations in familiar progamming languages such as R. We will explain the basic ideas behind a GPU, show how a programming language can, with a few extensions, transfer work to a GPU and pull results back, and then run some demonstration programs. A natural followup to this class would be the classes on CUDA programming, for optimal use of the GPU.

Introduction to CACDS

one class period

This lecture outlines the services offered by Center for Advanced Computing and Data Science facility, and gives a nice overview for those who are just starting to work on CACDS systems, or who are considering getting an account. Topics include the application/allocation process, the Unix environment, file transfer, module commands, batch processing, parallel processing options, the problem reporting system, and differences between the various clusters.

Data Visualization with Paraview

roughly 2 hour workshop

This session will begin with a presentation of visualization concepts, followed by a hands-on session using the Paraview software package.

Interactive Data Visualization with Python

roughly 2 hour workshop

This session will begin with a presentation of visualization concepts, followed by a hands-on session using the Plotly library for interactive visualization. A basic knowledge of python is essential for this class.

Basic Data Visualization with R

roughly 2 hour workshop

This session will begin with a presentation of visualization concepts, followed by a hands-on session using the ggplot2 library and customization of plots. A basic knowledge of R is essential for this class.

Interactive Data Visualization with R

roughly 2 hour workshop

This session will begin with a presentation of interactive visualization concepts, followed by a hands-on session using the Plotly library and Shiny for interactive visualization. A basic knowledge of R is essential for this class.

MPI Programming: an Introduction

one class period

MPI is introduced as a way to carry out parallel computations in a distributed memory environment. We present the most basic MPI functions: init, size, rank, broadcast, send, recv, finalize, and show how useful programs can be constructed with just these tools. We show how MPI programs can be timed, and analyzed for scalability. Students should be familiar with C, since all programming examples will be in that language.

MPI Test Drive

one class period

In many cases, the advantages of MPI’s parallel processing capabilities are built in to a standalone research program, such as ABAQUS, ANSYS, LAMMPS or VASP. This means that instead of worrying about writing MPI code, a user can concentrate on the more routine issues of getting the input data set up, writing an appropriate batch file, and doing timings to analyze performance. This lecture goes through the steps of running a LAMMPS problem on an CACDS cluster machine.

Numerical Computing in Julia

one class period

Julia is a high-level, high-performance dynamic programming language for technical computing, with a familiar syntax. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. Julia’s just-in-time compiler combined with the language’s design allow it to approach and often match the performance of C. This short course will provide an overview of the language, including comparisons with Matlab, R, and Python.

OpenMP Programming: an Introduction

one class period

OpenMP is introduced as a way to carry out some kinds of loop operations in parallel in a shared memory system; rules for parallelizable loops is discussed; the syntax for marking up parallel loops are shown; saxpy, jacobi, image processing, and quadrature examples are presented. Students should be familiar with C, since all programming examples will be in that language.

Parallel MATLAB: an Introduction

one class period

The Mathworks has added a number of parallel features to MATLAB. The most accessible is the “parfor” statement, which replaces the sequential execution of a for loop by a parallel procedure. This class provides examples of how to use parfor, how to evaluate the resulting speedup, and issues that may impede or forbid the use of parfor on certain loops. Students should be familiar with MATLAB.

R Programming: an Introduction

two class period or 3 hours

The R programming language is widely used for statistical programming and data analysis in social sciences, bioinformatics, and data analytics. Because of its powerful ability to READ and manipulate data, it has become a common tool data analysis and machine learning. This lecture will outline how to read data into R, how to write R functions, how to access the many specialized R packages, and some tips for improving efficiency. A natural followup to this class is the “Parallel R” class.

Parallel R

one class period

R is a statistical programming language that has become increasingly popular in data analysis and statistical applications. This tutorial will describe how an R user can leverage the parallel computing capabilities of modern supercomputers to speed up large computations or run large numbers of similar operations, such as Monte Carlo simulations, at the same time. Packages covered will include snow, Rmpi, and pbdR (“Programming with Big Data in R”).

Cluster Computing

one class period

A scientific program can run on a cluster as easily as on a desktop; it’s the user who has the harder transition. This class is a step by step discussion of the new ways of thinking that make remote computing natural and productive, including: how a cluster is different from a desktop; why clusters are necessary, and how they are organized and administered; the linux commands necessary to get basic tasks done; how to login remotely, and to transfer files; how computations are organized into a batch file, submitted to a queue, and managed by the job scheduler.

Scientific Python

Roughly 4 hour workshop

The Python language has exploded in popularity, and the scientific community has been developed numerous utilities and libraries, particulary in areas such as big data and machine learning. This workshop will show why Python is both easy to learn and powerful to use. We explain the basic syntax and simple commands. We then explore the libraries NumPy and SciPy for numerics, and Matplotlib for graphics. We show how a Python program can access functions written in C.

Keras for Deep Learning

Roughly 2 hour workshop

Keras is a scalable framework for distributed Machine Learning applications. In this class, we will cover some of the basics of Deep Learning and use a Jupyter notebook to perform classification in Keras using TensorFlow backend. Knowledge of Python is essential for this class.

Version Control with GitHub

one class period

Every computer user has faced the problem of keeping track of multiple versions of a document, program, or project. This problem becomes serious when a team of users must share, edit and test information cooperatively. Version control is a way to manage the orderly evolution of projects, so that information is not lost, multiple versions can be maintained and smoothly integrated, and older versions can be retrieved if an updated version turns out unsatisfactory. This lecture will introduce git, a popular and powerful version control system for managing software projects. We will see how to set up a project, enter, edit, merge and recover information, and how to organize collaborative software projects.