Haskell: Reading files
In writing the clustering algorithm which I’ve mentioned way too many times already I needed to process a text file which contained all the points and my initial approach looked like this:
import System.IO
main = do
withFile "clustering2.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStrLn contents)
It felt a bit clunky but I didn’t realise there was an easier way until I came across this thread. We can simplify reading a file to the following by using the http://zvon.org/other/haskell/Outputprelude/readFile_f.html function:
main = do
contents <- readFile "clustering2.txt"
putStrLn contents
We need to read the file in the IO monad which explains why we have the 'do' notation on the first line.
Another thing I didn’t realise until recently was that you don’t actually need to worry about the 'do' notation if you try to read from the IO monad inside GHCI.
In this context we’re reading from the IO monad when we bind 'readFile' to the variable 'contents' since 'readFile' returns type 'IO String':
> :t readFile
readFile :: FilePath -> IO String
We can therefore play around with the code pretty easily:
> contents <- readFile "clustering2.txt"
> let (bits, nodes) = process contents
> bits
24
> length nodes
19981
> take 10 nodes
[379,1669,5749,6927,7420,9030,9188,9667,11878,12169]
I think we’re able to do this because by being in GHCI we’re already in the context of the IO monad but I’m happy to be corrected if I haven’t explained that correctly.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.