Thundering Herd: November 2018

I spoke at the AKL Rust Meetup last month (slides) about my side project doing data mining in Rust. There were a number of engineers from Movio there who use Go, and I've been keen for a while to learn Go and compare it with Rust and Python for my data mining side projects, so that inspired me to knuckle down and learn Go.

Go is super simple. I was able to learn the important points in a couple of evenings by reading GoByExample, and I very quickly had an implementation of the FPGrowth algorithm in Go up and running. For reference, I also have implementations of FPGrowth in Rust, Python, Java and C++.

As a language, Go is very simple. The language lacks many of the higher level constructs of other modern languages, but the lack of these make it very easy to learn, straightforward to use, and easy to read and understand. It feels similar to Python. There's little hidden functionality; you can't overload operators for example, and there's no generics or macros, so the implementation for everything has to be rewritten for every type. This gets tedious, but it does at least mean the implementation for everything is simple and explicit, the code right in front of you.

I also really miss the functional constructs that are built into many other languages, like mapping a function over a sequence, filter, any, all, etc. With Go, you need to reimplement these yourself, and because there's no generics (yet), you need to do it for every type you want to use these on. The lack of generics is also painful when writing custom containers.

Not being able to key a map with a struct containing a slice was a nuisance for my problem domain; I ended up having to write a custom tree-set data structure due to this; though it was very easy to write thanks to in built maps. Whereas Rust, or even Java, has traits/functions you can implement to ensure things can be hashed.

The package management for Go feels a bit tacked on; requiring all Go projects to be in a GO_PATH seems a consequence of not having a tool the equal of Rust's Cargo coupled with something like crates.io.

And Go's design decision to use the case of a symbol's first letter to express whether that symbol is public or private is annoying. I have a long standing habit of using foo as the name for a single instance of type Foo, but that pattern doesn't work in Go. The consequence of this design choice is it leads programmers to using lots of non-descriptive names for things. Like single letter variable names. Or the dreaded myFoo.

The memory model of Go is simple, and again I think the simplicity is a strength of the language. Go uses escape analysis to determine whether a value escapes outside of a scope, and moves such values to the heap if so. Go also dynamically grows goroutines' stacks, so there's no stack overflow. Go is garbage collected, so you don't have to worry about deallocating things.

I found that thinking of values as being on the heap or stack wasn't a helpful mental model with Go. Once I started to think of variables as references to values and values being shared when I took the address (via the & operator), the memory model clicked.

I think Go's simple memory model and syntax make it a good candidate as a language to teach to beginner programmers, more so than Rust.

The build times are impressively fast, particularly on an incremental build. After the initial build of my project, I was getting build times to fast to perceive on my 2015 13" MBP, which is impressive. Rust has vastly slower build time.

The error messages produced by the Go compiler were very spartan. The Rust compiler produces very helpful error messages, and in general I think Rust is leading here.

Go has a very easy to use profile package which you can embed in your Go program. Combined with GraphViz, it produces simple CPU utilization graphs like this one:

CPU profile graph produced by Go's "profile" package and GraphViz.

Having an easy to use profiler bundled with your app is a huge plus. As we've seen with Firefox, this makes it easy for your users to send you profiles of their workloads on their own hardware. The graph visualization is also very simple to understand.

The fact that Go lacks the ability to mark variables/parameters as immutable is mind-boggling to me. Given the language designers came from C, I'm surprised by this. I've written enough multi-threaded and large system code to know the value of restricting what can mess with your state.

Goroutines are pretty lightweight and neat. You can also use them to make a simple "generator" object; spawn a goroutine to do your stateful computation, and yield each result by pushing it into a channel. The consumer can block on receiving the next value by receiving on the channel, and the producer will block when it pushes into a channel that's not yet been received on. Note you could do this with Rust too, but you'd have to spawn an OS thread to do this, which is more heavy weight than a goroutine, which are basically userspace threads.

Rust's Rayon parallelism crate is simply awesome, and using that I was able to easily and effectively parallelize my Rust FPGrowth implementation using Rayon's parallel-iterators. As best as I can tell, Go doesn't have anything on par with Rayon for parallelism. Go's goroutines are great for lightweight concurrency, but they don't make it as easy as using's Rayon's par_iter() to trivially parallelize a loop. Note, parallelism is not concurrency.

All of my attempts to parallelize my Go FPGrowth implementation as naively as I'd parallelized my Rust+Rayon implementation resulted in a slower Go program. In order to parallelize FPGrowth in Go, I'd have to do something complicated, though I'm sure channels and goroutines would make that easier than in a traditional language like Java or C++.

Go would really benefit from something like Rayon, but unfortunately due to Go's lack of immutability and a borrow checker, it's not safe to naively parallelize arbitrary loops like it is in Rust. So Rust wins on parallelism. Both languages are strong on concurrency, but Rust pulls ahead due to its safety features and Rayon.

Comparing Rust to Go is inevitable... Go to me feels like the spiritual successor to C, whereas Rust is the successor to C++.

I feel that Rust has a learning curve, and before you're over the hump, it can be hard to appreciate the benefits of the constraints Rust enforces. For Go, you get over that hump a lot sooner. Whereas with Rust, you get over that hump a lot later, but the heights you reach after are much higher.

Overall, I think Rust is superior, but if I'd learned Go first I'd probably be quite happy with Go.

Thundering Herd

Saturday 3 November 2018

On learning Go and a comparison with Rust