Setting Up A Cluster of Tiny PCs For Parallel Computing – A Note To Myself

Wait 5 sec.

[This article was first published on r on Everyday Is A School Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.Enjoyed learning the process of setting up a cluster of tiny PCs for parallel computing. A note to myself on installing Ubuntu, passwordless SSH, automating package installation across nodes, distributing R simulations, and comparing CV5 vs CV10 performance. Fun project!Motivations Part of something I want to learn this year is getting a little more into parallel computing. How we can distribute simulation computations across different devices. Lately, we have more reasons to do this because quite a few of our simulations require long running computation and leaving my laptop running overnight or several days is just not a good use it. We have also tried cloud computing as well and without knowing how those distributed cores are, well, distributed, it’s hard for me to conceptualize how these are done and what else we could optimize. Hence, what is a better way of doing it on our own! Sit tight, this is going to be a bumpy one. Let’s go!Objectives Which PCs to get?Install UbuntuAlign and fix IPsPasswordless sshSend multiple commands via sshInstall RCreate A Template R script For SimulationInstall Packages On All NodesUpload Rscript to NodesRun RscriptExtract dataCompare timeOpportunities for improvementLessons learntWhich PCs to Get? Preferably something functional and cheap! Something like a used Lenovo M715q Tiny PCs or something similar.Install Ubuntu Download Ubuntu ServerCreate a bootable USB using balenaEtcherWhen starting Lenovo up, press F12 continuously until it shows an option to boot from USB. If F12 does not work, reboot and press F1 to BIOS. Go to Startup Tab, change CSM Support to Enabled. Then set Primary Boot Priority to USB by moving priority to first. Then F10 to save configuration and exit. It will then reboot to USB.Make sure it’s connected to internet via LAN for smoother installation.Follow the instructions to install Ubuntu, setting username, password etc. Then reboot.Make sure to remove USB drive, if you didn’t it’ll remind you. Et voila!The installations were very quick, compared to the other OS I’ve installed in the past. Very smooth as well. I thoroughly enjoyed seeting these up.Align and Fix IPs For organizational purpose, make sure you go to your router setting and set your computer clusters to convenient IPs such as 192.168.1.101, 192.168.1.102, 192.168.1.103 etc. You may have to reboot your computer clusters after changing it on your router.Passwordless SSH Next, you want to set up passwordless SSH. This is crucial for R to work!1. Create a key ssh-keygen -t ed255192. Send Copy of Key To Your Node ssh-copy-id -i .ssh/my_key.pub username1@192.168.1.101 it will prompt you to enter your password, then after that you won’t need a pssword to ssh in.Passwordless Sudo This is optional. But if you’re like me, don’t want to repeat lots of typing on installation, and see if you can use bash or R to install packages, you’d need this.ssh -t username2@192.168.1.102 'echo "$(whoami) ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/$(whoami)'It would prompt you to enter your password. You would have to do this for all your nodesSend Multiple Commands Via SSH Install R for host in username1@192.168.1.101 username2@192.168.1.102 username3@192.168.1.103; do ssh -t $host 'sudo apt update && sudo apt install -y r-base r-base-dev'doneThis is basically installing R on all of our clusters one after another.Create A Template R script For Simulation Why do we do this? We want to take advantage of the multicore of each nodes as opposed to using clusters on future as the overhead network may add on to the time and makes optimization less efficiency. Instead, we will send a script to each node so that it can fork its own cores to run the simulation. Also, if we specify packages on our script, we can automate the process of installing these packages on our nodes.codelibrary(future)library(future.apply)library(dplyr)library(SuperLearner)library(ranger)library(xgboost)library(glmnet)plan(multicore, workers = 4)set.seed(1)n