R data.table and Apple M1 installation on Big Sur supporting openmp multithreading
Introduction
The official release of R4.1.0 has brought long-awaited native support for Apple M1 to R/Mac users. While the Rosetta engine provided by Apple does its job very well, we are always eager for more performance and less time spent on our data wrangling tasks. Unfortunately, there is still a lack of information regarding working in the new environment. Basic installation of data.table on any macOS causes a notification:
**********
This installation of data.table has not detected OpenMP support.
It should still work but in single-threaded mode.
If this is a Mac, please ensure you are using R>=3.4.0 and have followed our Mac instructions here:
https://github.com/Rdatatable/data.table/wiki/Installation.
This warning message should not occur on Windows or Linux. If it does, please file a GitHub issue.
**********
Notification leads to data.table installation guide describing several steps required for obtaining multithreading mode. Actually, there is no openmp support by Apple. This obligates to compile data.table
on a device from source using command:
install.packages("data.table", type = "source",
repos = "https://Rdatatable.gitlab.io/data.table")
The guidance provides several options for macOS preparation for installation from source. For my Rosetta installation, I prefer to use the compiler: GCC (Official GNU Fortran) ver mainly because of its smaller disk space requirements. Unfortunately, this option turns out to be absolutely useless in the case of native aarch64 installation. After several attempts using different options provided by the guidance, I came to the conclusion that all described options are ineffective for the aarch64 case. Moreover, searching for a solution across the internet yields nothing regarding this case.
Solution
The solution provided below is at your own responsibility and is considered experimental. I highly recommend keeping the previous R installation alongside the new one to address any unpredictable issues caused by compiling other packages, such as the stringi issue or rcpp or whatever.
The option with llvm will be used. Steps are corresponded to data.table
guidance except 0 one.
Step 0 (prepare RStudio)
First of all, you need to install preview release of RStudio which supports Apple Silicon (aarch64).
Step 1 (100% according guidance)
Now, ensure that you have command line tools installed. Do NOT skipย this step. It is essential. See https://github.com/Rdatatable/data.table/issues/1692. From the terminal, type:
xcode-select --install
If you get an error message:ย xcode-select: error: command line tools are already installed, use "Software Update" to install updates
, then you already have command line tools and can proceed to the next step. Else please follow the onscreen instructions and install it first.
Step 2 (100% according guidance)
Then, install homebrew if you have not already. After that, we can install the OpenMP enabled clang
from the terminal by typing:
# update: seems like this installs clang with openmp support,
# as pointed out by @botanize in #1817
brew update && brew install llvm
Note that homebrew have separate location for installing arm version of packages: opt/homebrew
, check details. So we need to reconfigure our building enviroment accordingly.
Step 3 (modified guidance)
Add the following lines to the fileย ~/.R/Makevars
ย using your favourite text editor. It’s likely you need to create theย .R
directory and the fileย Makevars
ย in it if it hasn’t already exist.
# if you downloaded llvm manually above, replace with your chosen NEW_PATH/clang
LLVM_LOC = /opt/homebrew/opt/llvm
CC=$(LLVM_LOC)/bin/clang -fopenmp
CXX=$(LLVM_LOC)/bin/clang++ -fopenmp
# -O3 should be faster than -O2 (default) level optimisation ..
CFLAGS=-g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe
CXXFLAGS=-g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe
LDFLAGS=-L/opt/homebrew/opt/gettext/lib -L$(LLVM_LOC)/lib -Wl,-rpath,$(LLVM_LOC)/lib
CPPFLAGS=-I/opt/homebrew/opt/gettext/include -I$(LLVM_LOC)/include
The only difference of configuration above with original one is substitution of compiler links to /opt/homebrew/...
. After that all necessary configurations are done and package ready to be installed from source. Use the following command: install.packages("data.table", type = "source")
Perfomance tests
Is it worth migrating from Rosetta to native support? While there are difficulties with installation and managing several lib directories, the benefits appear to be valuable. Benchmark results are provided below.
Hardware
-
MacBook Air (M1, 2020) | Memory 16 Gb | Big Sur 11.4
-
Core(TM) i7-7700 CPU | Memory 44 Gb | Ubuntu 20.04.2 LTS
Ubuntu is running as a virtualized instance on a remote server. Both systems use fast SSDs and were tested with the 4-way multithreading option activated. All tests were conducted on a refreshed session (also see SessionInfo()).
Data generation
I believe that performing simple mathematical calculations using base R and incorporating the most useful functions from the data.table package would be an appropriate profile for emulating real tasks. While this approach may not be comprehensive and entirely objective, it provides a practical assessment based on your specific needs.
library(bench)
library(data.table)
smpl0 <- rexp(3, n = 5e6) # for R base iteration
smpl1 <- data.table(fctr = sample(letters, n, replace = TRUE), num1 = rnorm(n, 3, 4), num2 = sample(1:100, n, replace = TRUE), num3 = runif(n, 0, 100))
smpl2 <- data.table(fctr = sample(letters, 26), num1 = rnorm(26, 3, 4), num2 = sample(1:100, 26, replace = TRUE), num3 = runif(26, 0, 100))
# base R simple benchmarking
mark(min_time = .1, min_iterations = 50,
lapply(smpl0, log),
purrr::map(smpl0, log), # some additional time needed to attach function
as.list(log(smpl0)))
# data.table common dunction usage
mark(min_time = .1, min_iterations = 50, check = FALSE,
smpl1[, lapply(.SD, mean), .SDcols = is.numeric],
smpl1[smpl2, on = "fctr"],
uniqueN(smpl1, by = c("fctr", "num2")))
Iteration with base R and purrr package
Intel comparible hardware running Ubuntu
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 lapply(smpl, log) 1.2s 1.44s 0.554 38.1MB 0.853 50 77 1.5m
2 purrr::map(smpl, log) 4.89s 6.72s 0.141 38.1MB 0.761 50 270 5.92m
3 as.list(log(smpl)) 124.93ms 215.6ms 2.78 76.3MB 1.06 50 19 18s
Rosseta emulation mode
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 lapply(smpl0, log) 978.31ms 1.33s 0.569 38.1MB 0.762 50 67 1.47m
2 purrr::map(smpl0, log) 3.84s 5.23s 0.169 38.1MB 0.790 50 234 4.94m
3 as.list(log(smpl0)) 95.75ms 132.74ms 3.23 76.3MB 0.968 50 15 15.5s
It seems Rosetta emulation does not work very well showing 5-10% advantage over intel comparable platform.
Native arm mode
# A tibble: 3 x 9
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 lapply(smpl0, log) 699.88ms 763.53ms 1.15 38.1MB 1.15 50 50 43.44s
2 purrr::map(smpl0, log) 2.56s 3.68s 0.239 38.4MB 1.08 50 226 3.49m
3 as.list(log(smpl0)) 67.98ms 76.13ms 6.23 76.3MB 2.24 50 18 8.02s
Native support shows more solid results. It seems to be twice faster than Rosetta mode. Very promising!
Some wrangling with data.table
Intel comparible hardware running Ubuntu
# A tibble: 3 x 9
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 smpl1[, lapply(.SD, mean), .SDcols = is.numeric] 485.05ms 487.4ms 2.02 1.8MB 0 50 0 24.74s
2 smpl1[smpl2, on = "fctr"] 4.58s 5.46s 0.181 5.96GB 0.184 50 51 4.61m
3 uniqueN(smpl1, by = c("fctr", "num2")) 720.81ms 1.33s 0.779 381.5MB 0.0623 50 4 1.07m
Rosseta emulation mode
# A tibble: 3 x 9
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 smpl1[, lapply(.SD, mean), .SDcols = is.numeric] 25.29s 25.85s 0.0365 81.02KB 0 50 0 22.86m
2 smpl1[smpl2, on = "fctr"] 4.57s 4.96s 0.199 5.96GB 0.199 50 50 4.18m
3 uniqueN(smpl1, by = c("fctr", "num2")) 570.96ms 596.29ms 1.49 381.48MB 0.119 50 4 33.62s
data.table
functions show contrudictionary results:
-
The aggregation function reveals a significant gap between the Intel-based system and Rosetta mode. Rosetta appears to be slower by approximately 50 times. This is an intriguing result and is not easily explained. To validate this observation, I reset the R session and repeated the test several times, yet the results remained consistently similar.
-
Left join seems to be faster for Rosetta, just as it is for the base R level, with an improvement of 5-10%.
-
Calculating unique observations is more than two times faster on Rosetta than on the Intel-based system.
Native arm mode
# A tibble: 3 x 9
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 smpl[, lapply(.SD, mean), .SDcols = is.numeric] 462.24ms 464ms 2.12 81.02KB 0 50 0 23.63s
2 smpl1[smpl2, on = "fctr"] 4.61s 4.9s 0.205 5.96GB 0.410 50 100 4.07m
3 uniqueN(smpl1, by = c("fctr", "num2")) 518.12ms 577.3ms 1.51 381.48MB 0.121 50 4 33.19s
Interestingly, the native mode demonstrates a slight advantage over Rosetta mode for left join and unique calculation. However, the performance for aggregation grows dramatically and exhibits a small advantage over the Intel-based system (approximately 5%).
SessionInfo()
Intel comparible hardware running Ubuntu
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=ru_RU.UTF-8 LC_NUMERIC=C LC_TIME=ru_RU.UTF-8 LC_COLLATE=ru_RU.UTF-8 LC_MONETARY=ru_RU.UTF-8 LC_MESSAGES=ru_RU.UTF-8
[7] LC_PAPER=ru_RU.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.3 tidyverse_1.3.1
[10] data.table_1.14.0 bench_1.1.1
Rosseta emulation mode
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.1 bench_1.1.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.1
[10] ggplot2_3.3.3 tidyverse_1.3.1
Native arm mode
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-aarch64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.3 tidyverse_1.3.1
[10] data.table_1.14.1 bench_1.1.1
Conclusions
-
Running R and
data.table
in native mode is still considered experimental, and practitioners often encounter a lack of information to adapt their macOS environment to meet the new requirements. -
Despite being experimental, native mode provides a substantial performance improvement over Rosetta mode for base R, resulting in calculations that are twice as fast.
-
The surprising drastic performance downgrade observed in the aggregation function for
data.table
calculations running in Rosetta mode is resolved when running in native mode.
Recommendations
-
Be prepared for potential issues as running R and data.table in native mode is experimental.
-
Keep an older R version as a backup plan to ensure continuity.
-
Consider using Rswitch for seamless switching between old and new R versions. RSwitch