Go: Multi-threaded writing to a CSV file
As part of a Go script I’ve been working on I wanted to write to a CSV file from multiple Go routines, but realised that the built in CSV Writer isn’t thread safe.
My first attempt at writing to the CSV file looked like this:
package main
import (
"encoding/csv"
"os"
"log"
"strconv"
)
func main() {
csvFile, err := os.Create("/tmp/foo.csv")
if err != nil {
log.Panic(err)
}
w := csv.NewWriter(csvFile)
w.Write([]string{"id1","id2","id3"})
count := 100
done := make(chan bool, count)
for i := 0; i < count; i++ {
go func(i int) {
w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
done <- true
}(i)
}
for i:=0; i < count; i++ {
<- done
}
w.Flush()
}
This script should output the numbers from 0-99 three times on each line. Some rows in the file are written correctly, but as we can see below, some aren’t:
40,40,40
37,37,37
38,38,38
18,18,39
^@,39,39
...
67,67,70,^@70,70
65,65,65
73,73,73
66,66,66
72,72,72
75,74,75,74,75
74
7779^@,79,77
...
One way that we can make our script safe is to use a mutex whenever we’re calling any methods on the CSV writer. I wrote the following code to do this:
type CsvWriter struct {
mutex *sync.Mutex
csvWriter *csv.Writer
}
func NewCsvWriter(fileName string) (*CsvWriter, error) {
csvFile, err := os.Create(fileName)
if err != nil {
return nil, err
}
w := csv.NewWriter(csvFile)
return &CsvWriter{csvWriter:w, mutex: &sync.Mutex{}}, nil
}
func (w *CsvWriter) Write(row []string) {
w.mutex.Lock()
w.csvWriter.Write(row)
w.mutex.Unlock()
}
func (w *CsvWriter) Flush() {
w.mutex.Lock()
w.csvWriter.Flush()
w.mutex.Unlock()
}
We create a mutex when NewCsvWriter instantiates CsvWriter and then use it in the Write and Flush functions so that only one go routine at a time can access the underlying CsvWriter. We then tweak the initial script to call this class instead of calling CsvWriter directly:
func main() {
w, err := NewCsvWriter("/tmp/foo-safe.csv")
if err != nil {
log.Panic(err)
}
w.Write([]string{"id1","id2","id3"})
count := 100
done := make(chan bool, count)
for i := 0; i < count; i++ {
go func(i int) {
w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
done <- true
}(i)
}
for i:=0; i < count; i++ {
<- done
}
w.Flush()
}
And now if we inspect the CSV file all lines have been written successfully:
...
25,25,25
13,13,13
29,29,29
32,32,32
26,26,26
30,30,30
27,27,27
31,31,31
28,28,28
34,34,34
35,35,35
33,33,33
37,37,37
36,36,36
...
That’s all for now. If you have any suggestions for a better way to do this do let me know in the comments or on twitter - I’m @markhneedham
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.