Many problems in machine learning solve mathematical optimization problems which, in most non-linear and non-convex cases, requires iterative methods. In the parallel case, each iteration of block coordinate descent requires communication.
CA-Krylov methods unroll vector recurrences and rearrange the sequence of computation in way that defers communication for s iterations, where s is a tunable parameter. Block coordinate descent is an iterative algorithm which at each iteration samples a small subset of rows or columns of the input matrix, solves a subproblem using just the chosen rows or columns, and obtains a partial solution.
Our experimental results illustrate that our new, communication-avoiding methods can obtain speedups of up to 6. This thesis is focused on deriving communication-avoiding variants of the block coordinate descent method, which is a first-order method that has strong convergence rates for many optimization problems.
This thesis adapts well-known techniques from existing work on communication-avoiding CA Krylov and s-step Krylov methods. For CA-kernel methods the computational and bandwidth costs do not increase. On parallel machines the cost of moving data from one processor to another over an interconnection network is the most expensive operation.
Our communication-avoiding variants reduce the latency cost by a tunable factor of s at the expense of a factor of s increase in computational and bandwidth costs for the L2 and L1 least-squares and SVM problems. This is because the CA-variants of kernel methods can reuse elements of the kernel matrix already computed and therefore do not need to compute and communicate additional elements of the kernel matrix.
With this technique we have achieved speedups of up to 6. Therefore, avoiding communication is key to attaining high performance.
This solution is then iteratively refined until the optimal solution is reached or until convergence criteria are met. For CA-kernel methods we show modeled speedups of 26x, x, and x for MPI on a predicted Exascale system, Spark on a predicted Exascale system, and Spark on a cloud system, respectively.
Finally, we also present an adaptive batch size technique which reduces the latency cost of training convolutional neural networks CNN.
Furthermore, we were able to train the ImageNet dataset using the ResNet network with a batch size of up towhich would allow neural network training to attain a higher fraction of peak GPU performance than training with smaller batch sizes.
We apply a similar recurrence unrolling technique to block coordinate descent in order to obtain communication-avoiding variants which solve the L2-regularized least-squares, L1-regularized least-squares, Support Vector Machines, and Kernel problems. Furthermore we also experimentally confirm that our algorithms are numerically stable for large values of s.
The CA-variants for these problems require additional computation and bandwidth in order to update the residual vector. On modern computer architectures, the cost of moving data communication from main memory to caches in a single machine is orders of magnitude more expensive than the cost of performing floating-point operations computation.
In addition to hardware improvements, algorithm redesign is also an important direction to further reduce running times. While hardware improvements have facilitated the development of machine learning models in a single machine, the analysis of large amounts of data still requires parallel computing to obtain shorter running times or where the dataset cannot be stored on a single machine.
The large gap between computation and communication suggests that algorithm redesign should be driven by the goal of avoiding communication and, if necessary, decreasing communication at the expense of additional computation. For CA-Krylov methods the reduction in communication cost comes at the expense of numerical instability for large values of s.After you have written your dissertation, formatted it correctly, assembled the pages into the correct organization, and obtained your signatures, you are ready to file it with UC Berkeley’s Graduate Division.
Step 1: Convert your dissertation in to a standard PDF file.
University of California, Berkeley Professor Michael A. Marletta, Chair Polysaccharide monooxygenases (PMOs) are a newly discovered and growing superfamily of secreted copper catalysts found in nature.
Toggle navigation. EECS at UC Berkeley. Main menu.
About Toggle submenu for About. About Overview; By the Numbers; Diversity. Dissertations & Theses: Home Finding Dissertations & Theses The majority of dissertations in the UC Berkeley Libraries are from UC Berkeley. Ph.D.
Dissertations - A High-Density Carbon Fiber Neural Recording Array Technology: Design, Fabrication, Assembly, and Validation Travis Massey [advisor: Kristofer Pister and Michel Maharbiz].
CBE Doctorate Degree Program & Requirements Doctor of Philosophy in Chemical Engineering The Ph.D. program is designed to enlarge the body of knowledge of the student and, more importantly, to discover and develop talent for original, productive, and creative work in.Download