This weekend, the Go community on StackOverflow went over fifty thousand posted questions. This blog post is about my favorite type of questions on Go, questions about creating and handling goroutines and pipelines.
A quick digression first. I love going over StackOverflow questions, especially the ones about the Go programming language. It's a good place to check out what people are trying to do with Go, with all kinds of approaches and usages. And my favorite questions to answer are the ones on the usage of goroutines and pipelines. Sometimes I even just go through the answered ones, to see if that is something I already know how to do and to see if I can provide a different solution. A lot of opportunities to learn something new, even when I provide wrong answers (it happened a couple of times).
Why goroutines and pipelines
This post is based on a feeling I got while going through questions on goroutines, channels, and pipelines in Go over quite some time now. Even though there are excellent resources out there on these topics, there are still a lot of questions, that boil down to these:
- How to run and sync multiple goroutines and get the results from all of them?
- How can I prevent a deadlock when running multiple goroutines?
In this post, I'm not going through what goroutines and channels are. Also, there is an excellent blog post already on pipelines in Go that is a good starting point for most beginners. This post is about a few tips on what to pay attention to when constructing a pipeline with goroutines in Go.
What to think about?
Let's start with a simple example.
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
results := []int{}
dataCh := make(chan int)
for i:= 0; i < 20; i++ {
go getData(dataCh)
}
for item := range dataCh {
results = append(results, item)
}
fmt.Println(results)
}
func getData(out chan<- int) {
n := rand.Intn(100)
time.Sleep(time.Duration(n)*time.Millisecond)
out <- n
}
Seems innocent enough. It has all the elements. Goroutines are created and once they write to the channel, they will finish their work. We are ranging over the channel to get the results. Still, this code produces a deadlock when you run it. Even though all the data is received through the dataCh
channel, deadlock happens because there is no mechanism to close the dataCh
channel, and execution is blocked. So, here is the first tip:
Try to stay away from reading from the resulting channel in the main goroutine if you don't have a way to close that channel. This, as always, depends on the situation and what you are trying to build. I'm not saying it is not possible to read from the dataCh
in the main goroutine, but you need to be able to close that channel once every goroutine is finished with their work.
The first option that quickly springs to mind is to read from the resulting channel in a separate goroutine. Let's try that.
Not out of the woods just yet
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
results := []int{}
dataCh := make(chan int)
for i:= 0; i < 20; i++ {
go getData(dataCh)
}
go func() {
for item := range dataCh {
results = append(results, item)
}
}()
fmt.Println(results)
}
func getData(out chan<- int) {
n := rand.Intn(100)
time.Sleep(time.Duration(n)*time.Millisecond)
out <- n
}
Here we created a separate goroutine to read from the dataCh
channel. But, when we run this code we get an empty slice, most of the time. The reason behind this is that the main goroutine finished executing before all other created goroutines finished their work. And this is where tip number two comes into place:
Sync mechanisms are your friends. Combine them with properly closing all blocking channels. Depending on the use case and what you are trying to accomplish, you will probably have to use some form of a sync mechanism in your pipeline. A sync mechanism is needed to wait for all producers (goroutines that produce the data) to finish with the execution, so all data can be read.
In our example, we will use sync.WaitGroup
to sync all producer goroutines and close the dataCh
properly.
package main
import (
"fmt"
"math/rand"
"time"
"sync"
)
func main() {
results := []int{}
dataCh := make(chan int)
var wg sync.WaitGroup
wg.Add(20)
for i:= 0; i < 20; i++ {
go getData(&wg, dataCh)
}
go func() {
for item := range dataCh {
results = append(results, item)
}
}()
wg.Wait()
close(dataCh)
fmt.Println(results)
}
func getData(wg *sync.WaitGroup, out chan<- int) {
defer wg.Done()
n := rand.Intn(100)
time.Sleep(time.Duration(n)*time.Millisecond)
out <- n
}
A couple of things are different in this example. Main takeaways:
- We initialized the
sync.WaitGroup
and since we know the number of goroutines we are creating, we add that number - The
getData
function now also receives the pointer to thesync.WaitGroup
, so it can signal after it's done with execution - In the main goroutine, we use
wg.Wait()
to wait for all goroutines to executewg.Done()
. Afterward, we close thedataCh
channel so the listener goroutine can be done with the execution.
Now you run this code and get... a slice with 19 elements in it. Not the result we expected.
Are we there yet?
One more iteration, I promise :).
How did we get 19 elements, shouldn't there be 20 since we created 20 goroutines??
Well, yes, but the results
slice get printed before the last element gets added to it. Hence, the final tip:
Be wary of the execution flow when constructing your pipeline with goroutines. It is a good approach to develop a mental model of the execution flow when constructing your pipeline. Try to figure out when each goroutine should start and finish execution, and how their execution flow is related to the execution flow of the main goroutine.
The final version should look like this:
package main
import (
"fmt"
"math/rand"
"time"
"sync"
)
func main() {
results := []int{}
dataCh := make(chan int)
var wg sync.WaitGroup
wg.Add(20)
for i:= 0; i < 20; i++ {
go getData(&wg, dataCh)
}
go func(wg *sync.WaitGroup, data chan int) {
wg.Wait()
close(dataCh)
}(&wg, dataCh)
for item := range dataCh {
results = append(results, item)
}
fmt.Println(results)
}
func getData(wg *sync.WaitGroup, out chan<- int) {
defer wg.Done()
n := rand.Intn(100)
time.Sleep(time.Duration(n)*time.Millisecond)
out <- n
}
With this version, there are two major differences:
- Reading the
dataCh
channel is now in the main goroutine. - Waiting on all producer goroutines and closing the
dataCh
channel is moved to a separate goroutine.
With this version of our code, we switched where the aforementioned operations are executed. Now, since the data is read from the dataCh
channel in the main goroutine, the final element gets added to the results
slice even after this channel gets closed.
Conclusion
Let's go over the tips.
- Don't read from the resulting channel in the main goroutine if you don't have a way to close that channel. This will probably end with a deadlock.
- Sync mechanisms are your friends. Combine them with properly closing all blocking channels. Good questions to ask yourself here is how can I know when all goroutines are done with execution and when should I close the output (resulting) channel.
- Be wary of the execution flow when constructing your pipeline with goroutines. When constructing a pipeline, think about when should each goroutine be executed, including the main goroutine. Better yet, sketch it out. It helps greatly to go over the execution flow at your own pace to check if everything is executed in proper order. Either that or do a lot of testing and console-logging to validate that your pipeline works correctly.
Lastly, my advice to you, when learning about goroutines and pipelines, is to just try things out. Try a lot of examples and try to do them in more than one way. Don't be afraid to be wrong and to ask for help, there is a great learning experience in doing so.
Stay safe and happy coding!