100 Languages Speedrun: Episode 51: R

R is a language for "statistical" computing. I'm not generally a fan of the category, and think you'd be much better off using a general purpose language like Python with some "statistical" packages, but let's take a look.

Hello, World!

R is normally used in interactive environment like Jupyter Notebooks ("Jupyter" being named after Julia, Python, and R, even though it's mostly Python, Python, and Python).

You can also run R from command line. It starts super spammy unless you pass -q flag:

$ R -q
> print("Hello, World!")
[1] "Hello, World!"
>
Save workspace image? [y/n/c]: n

And finally, you can also write a standalone script, with Rscript binary:

#!/usr/bin/env Rscript

print("Hello, World!")
./hello.r
[1] "Hello, World!"

What is going on here with this output? What's the [1]?

R is extremely array-oriented, so much that it treats everything as an array. So "Hello, World!" is really a 1-element array with "Hello, World!" as its first and only element.

You can see this if you create an array of all values from 200 to 300. It's going to be printed like this:

> seq(200,300)
  [1] 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
 [19] 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
 [37] 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
 [55] 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271
 [73] 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
 [91] 290 291 292 293 294 295 296 297 298 299 300

Anyway, this short demonstration aside, this is the actual Hello, World! program:

#!/usr/bin/env Rscript

cat("Hello, World!\n")
$ ./hello2.r
Hello, World!

It's called cat because it concatenates the elements of the input, similar to Unix cat command. In both cases, you can use it for just a single input, in which case the name can be fairly confusing.

FizzBuzz

We can do the classic FizzBuzz:

#!/usr/bin/env Rscript

for (i in seq(1, 100)) {
  if (i %% 15 == 0) {
    cat("FizzBuzz\n")
  } else if (i %% 3 == 0) {
    cat("Fizz\n")
  } else if (i %% 5 == 0) {
    cat("Buzz\n")
  } else {
    cat(i)
    cat("\n")
  }
}

That however would be completely missing the point of R. R is array-oriented, and we're operating one element at a time.

Let's give it another try:

#!/usr/bin/env Rscript

i = seq(1, 100)
x = i
x[i %% 3 == 0] = "Fizz"
x[i %% 5 == 0] = "Buzz"
x[i %% 15 == 0] = "FizzBuzz"
cat(x, sep="\n")

What's going on here?

  • seq(1, 100) is an array of integers from 1 to 100.
  • i = seq(1, 100) assigns that to i
  • x = i might be a bit of a surprise, as it copies i, it doesn't just reference the same array again
  • i %% 3 is an array of remainders of i divided by 3, so it goes in cycle 1 2 0 1 2 0 and so on.
  • i %% 3 == 0 is an array of boolean values, so it goes in cycle FALSE FALSE TRUE FALSE FALSE TRUE and so on
  • x[i %% 3 == 0] = "Fizz" assigns "Fizz" to those elements of x where corresponding i %% 3 == 0 is TRUE
  • and analogously for Buzz and FizzBuzz
  • and finally we concatenate the results, using newline as a separator - it's called "separator" but it's also used after the final element

Fibonacci

Let's first write a function as if R was a regular language:

#!/usr/bin/env Rscript

fib = function(n) {
  if (n <= 2) {
    1
  } else {
    fib(n - 1) + fib(n - 2)
  }
}

for (i in seq(1, 20)) {
  cat("fib(", i, ") = ", fib(i), "\n", sep="")
}
$ ./fib.r
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

Fibonacci with matrices

As R is supposed to be an array-oriented language, it's a reasonable expectation it would have full support for matrices like Octave, Julia and so on. However, it does not.

Matrices have super painful syntax, and no matrix operations are actually defined - if you try to multiply two matrices, it will just do element-wise multiplication of their elements. There's %*% for matrix multiplication, but there's no matrix exponentiation.

Even Ruby has Matrix[[1,1],[1,0]] ** 10 in standard library, and that's not exactly a "scientific" language.

All right, let's do install.packages("matrixcalc") from the R repl. Annoyingly that asks me for which server from the list of 84 I want to use to download a few MBs, like it's the early 1990s and any of that matters.

#!/usr/bin/env Rscript

require(matrixcalc)

fib = function(n) {
  m = matrix(c(1,1,1,0), ncol=2)
  matrix.power(m, n)[1,2]
}

for (i in seq(1, 20)) {
  cat("fib(", i, ") = ", fib(i), "\n", sep="")
}

Not amazing, but let's give it a go:

$ ./fib2.r
Loading required package: matrixcalc
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

We reached another baffling thing. Why the hell did R think it's reasonable to inform me that a script loaded some package. Imagine if JavaScript was doing that and starting an app dumped 1000 entries to the console.

We need to do something silly to get rid of that message:

#!/usr/bin/env Rscript

suppressPackageStartupMessages(require(matrixcalc))

fib = function(n) {
  m = matrix(c(1,1,1,0), ncol=2)
  matrix.power(m, n)[1,2]
}

for (i in seq(1, 20)) {
  cat("fib(", i, ") = ", fib(i), "\n", sep="")
}

Fetch some JSON

Let's get slightly out of R's comfort zone, and try to fetch some JSON data, and iterate somewhere within it.

First we need install.packages("httr"):

#!/usr/bin/env Rscript

suppressPackageStartupMessages(require(httr))

# JSON looks like this:
# {
#   "temperature": "+8 °C",
#   "wind": "17 km/h",
#   "description": "Partly cloudy",
#   "forecast": [
#     {
#       "day": "1",
#       "temperature": "+7 °C",
#       "wind": "17 km/h"
#     },
#     {
#       "day": "2",
#       "temperature": "+7 °C",
#       "wind": "9 km/h"
#     },
#     {
#       "day": "3",
#       "temperature": "+8 °C",
#       "wind": "9 km/h"
#     }
#   ]
# }

url = "https://goweather.herokuapp.com/weather/London"
data = content(GET(url))

for (day in data$forecast) {
  cat("Forecast for", day$day, "is", day$temperature, "\n")
}
$ ./weather.r
Forecast for 1 is +7 °C
Forecast for 2 is +7 °C
Forecast for 3 is +8 °C

R doesn't have dictionaries, but its arrays can have names associated with their columns, which is close enough for this. data$forecast is like data["forecast"] in a more usual language. httr detects JSON, and converts it appropriately, which is always nice.

If you try to print data, it looks like a disaster (multiple empty lines preserved), but if you know the structure and you're just reading, it works well enough:

> data
$temperature
[1] "+8 °C"

$wind
[1] "17 km/h"

$description
[1] "Partly cloudy"

$forecast
$forecast[[1]]
$forecast[[1]]$day
[1] "1"

$forecast[[1]]$temperature
[1] "+7 °C"

$forecast[[1]]$wind
[1] "17 km/h"


$forecast[[2]]
$forecast[[2]]$day
[1] "2"

$forecast[[2]]$temperature
[1] "+7 °C"

$forecast[[2]]$wind
[1] "9 km/h"


$forecast[[3]]
$forecast[[3]]$day
[1] "3"

$forecast[[3]]$temperature
[1] "+8 °C"

$forecast[[3]]$wind
[1] "9 km/h"



>

Should you use R?

I'd advise against it. You're much better off with Python or Julia.

R is only designed for very specific style of computing, and if you step outside that style, it starts struggling and being awkward real fast. And you'll do that a lot for any real project. Even in data science all the boring stuff like fetching data, parsing it, cleaning it up, and formatting the results generally consume more of the project than the analysis itself, and Python and Julia simply handle such parts better.

I haven't done much in depth research on that, but from a quick look it doesn't look like R has any ecosystem advantage over Python or Julia. The statistical packages you'd expect are there for all of them, and for the non-statistical ones, R is quite behind.

If you're a developer, all this should be clear enough, and R - or for that matter other scientific languages - have very limited appeal to developers.

If you're a data scientist or a researcher, R might tempt you, but I'd strongly recommend learning a general purpose language like Python (or Julia, which is close enough to general purpose). It might be a bit more complicated, but it will give you a lot more power and flexibility than learning something an overly specialized language like R.

Code

All code examples for the series will be in this repository.

Code for the R episode is available here.