Look at this program:
main = do contents <- readFile "foo.txt" writeFile "foo.txt" ('a':contents)
What does it do?
The tale of a contrived example
A little over a year ago, I was working on internationalizing the Trello web client, and I wrote a little Haskell program which went through and parsed several hundred Mustache templates, extracted all the English-looking strings from them, and spat out Teacup templates that had those strings replaced with lookups in a table of translations.
The program was interactive, prompting you to enter a reasonable “key name” for each English string, which it saved across runs in a little text file. If you’d already identified a string before, it wouldn’t bother to ask you again, even across multiple files.
But sometimes – just sometimes! – when I ran this program, it would spit out the following error message:
openFile: resource busy (file is locked)
This was mysterious, especially given that it was not happening consistently.
The program in question looked something like this:
- construct a
Bimapof keys-to-strings by reading a file
- do some stuff, consulting the bimap and adding new entries to it over time
- serialize the bimap and overwrite the file it came from with the new mappings
After a bit of squinting at this error, trying out similar but simpler things, I happened upon a three-line program, the same one you’ve already seen:
main = do contents <- readFile "foo.txt" writeFile "foo.txt" ('a':contents)
And what does that program do?
It crashes with the same error – consistently.
Now we’re talkin’
I suspected that this had something to do with lazy I/O, that bogeyman of which I had heard whispers in the past. I figured that Haskell’s
readFile had decided not to actually, you know, read the file contents until someone asked for them. As such,
readFile would have to keep the file handle open until the contents were requested, which wouldn’t happen until after
writeFile attempted to open the same file. Which would fail, naturally.
So, to fix this, all we need to do is force evaluation before
writeFile, and we’ll be golden. Right? Right.
Up to this point, I am on the right track. That will not last long.
Crazy little thing called
I vaguely recalled something called
seq, which could be used to force evaluation of thunks. As I understood at the time, it was generally used to improve memory behavior of programs that would allocate a bunch of intermediate thunks. But why not use it to control evaluation order as well?
main = do contents <- readFile "foo.txt" seq contents (writeFile "foo.txt" ('a':contents))
Hmm. Still doesn’t work. Why not?
If you have any significant Haskell experience, the answer is probably obvious. Stay with me! It’s going to get a lot worse before it gets any better.
We have a mystery on our hands
The first thing I tried, naturally, was sprinkling some
printfs over the code, just to make sure it was crashing where I thought it was:
main = do putStrLn "about to open for reading" contents <- readFile "foo.txt" putStrLn "that statement is over but we don't know what was read" seq contents (return ()) putStrLn "about to open for writing" writeFile "foo.txt" ('a':contents) putStrLn "done writing"
And running this, I got the following output:
about to open for reading that statement is over but we don't know what was read about to open for writing test: foo.txt: openFile: resource busy (file is locked)
Which confirmed my suspicion: somehow, the file was still open for reading when we tried to open it for writing.
“Hmm,” past me thought, “Either
seq isn’t actually forcing evaluation (I vaguely remember something about it not being intuitive…) or something deeper and weirder is happening here.” Maybe the
seq call is being optimized away, since its second argument isn’t actually used? That’s a thing that can happen, right?
Something deeper and weirder
Since I was on a mac, I fired up
dtruss to try to decide whether or not the
seq call was actually doing anything:
$ echo "b" > foo.txt $ ghc Prepend.hs -o prepend $ sudo dtruss -f ./prepend
One root password later, and I got some interesting output. Actually a ton of output, condensed to the relevant parts here:1
62260/0x3d67c0: write(0x1, "about to open for reading\n\0", 0x1A) = 26 0 62260/0x3d67c0: open("foo.txt\0", 0x20004, 0x1B6) = 3 0 62260/0x3d67c0: fstat64(0x3, 0x10F508070, 0x1B6) = 0 0 62260/0x3d67c0: write(0x1, "that statement is over but we don't know what was read\n\0", 0x37) = 55 0 62260/0x3d67c0: read(0x3, "b\n(\0", 0x1FA0) = 2 0 62260/0x3d67c0: write(0x1, "about to open for writing\n(\0", 0x1A) = 26 0 62260/0x3d67c0: open("foo.txt\0", 0x20205, 0x1B6) = 4 0 62260/0x3d67c0: fstat64(0x4, 0x10F508170, 0x1B6) = 0 0 62260/0x3d67c0: close(0x4) = 0 0 62260/0x3d67c0: write_nocancel(0x2, "prepend: \0", 0x9) = 9 0 62260/0x3d67c0: write_nocancel(0x2, "foo.txt: openFile: resource busy (file is locked)\0", 0x31) = 49 0 62260/0x3d67c0: write_nocancel(0x2, "\n\0", 0x1) = 1 0
Getting at this precise output was a little tricky, as GHC by default will line-buffer output to a terminal and block-buffer output otherwise, and
dtrussredirects all of the examined program’s output to stderr (for some reason). So if you try to redirect the
dtrussoutput to a file, you’ll see something very different, unless you manually set the buffering mode. Figuring that out was another fun part of debugging this program.
dtruss output doesn’t make for great skimming, so I’ll prettify it a bit:
write(stdout, "about to open for reading\n(\0", 26 bytes) = 26 bytes written open("foo.txt\0", for reading, 0666) = opened as FD 3 fstat64(FD 3, struct address, 0666) = information put into the provided struct write(stdout, "that statement is over but we don't know what was read\n\0", 55 bytes) = 55 bytes written read(FD 3, string address, no more than 8096 bytes please) = 2 bytes read: "b\n" write(stdout, "about to open for writing\n\0", 26) = 26 bytes written open("foo.txt\0", for writing, 0666) = opened as FD 4 fstat64(FD 4, struct address, 0666) = information put into the provided struct close(FD 4) = closed successfully
Presumably whatever it saw in the second
fstat64 call was not to its liking, so it decided to close the file descriptor and begin printing the error messages (which I omitted from the prettified output).
But look at that: it actually did read the file! When I called
seq, it read the whole thing – we can see
b\n right there. Whatever that strange thing was that I didn’t quite remember about
seq clearly wasn’t all that important. This code is fine.
Give past me some time; I’ll get there eventually.
At this point I was quite confused, so I tried something radical:
import System.IO main :: IO () main = do readHandle <- openFile "foo.txt" ReadMode contents <- hGetContents readHandle seq contents (return ()) hClose readHandle writeHandle <- openFile "foo.txt" WriteMode hPutStr writeHandle ('a':contents) hClose writeHandle
And that appeared to work perfectly.2 So it is possible to do this in Haskell, and I have agained verified that I totally get
seq. Sanity check complete.
Actually, the first time I tried this I forgot to close the
writeHandle, and thought the sanity check had failed, and spent a good amount of time staring at the
dtrussoutput to this program before it clicked. Yeah… this was not one of my better days.
Encouraged by these results, I tried another, slightly less intense sanity check, expecting this one to work too (for some reason):
import System.IO main :: IO () main = do putStrLn "about to do reading" contents <- withFile "foo.txt" ReadMode hGetContents putStrLn "about to seq" seq contents (return ()) putStrLn "about to do writing" withFile "foo.txt" WriteMode (flip hPutStr ('a':contents)) putStrLn "done writing"
Which actually does not work at all. The (prettified)
dtruss output reveals why this is:
write(stdout, "about to do reading\n\200\004(\0", 20 bytes) = 20 bytes written open("foo.txt\0", for reading, 0666) = opened as FD 3 fstat64(0x3, struct address, 0666) = information put into the provided struct close(FD 3) = closed successfully write(stdout, "about to do writing\n@\004\0", 20 bytes) = 20 bytes written open("foo.txt\0", 0x20205, 0666) = opened as FD 3 fstat64(FD 3, struct address, 0666) = information put into the provided struct ftruncate(FD 3, 0x0, 0666) = file truncated successfully write(FD 3, "a\004\0", 1 byte) = 1 byte written close(FD 3) = closed successfully write(stdout, "done writing\n\004\b\0", 13 bytes) = 13 bytes written
Of course the
withFile command closed the handle we wanted to read from before we forced it to read, so this doesn’t work.
This just confirmed what I already knew:
hGetContents doesn’t do any reading. Only the
seq call causes the actual read to happen, and that happens after the handle has already been closed by
Now I don’t think it’s completely unreasonable, at this point, to expect some kind of error. Am I not trying to read from a file handle that’s already been closed? Isn’t that bad?
I would have liked to see a big red “Hey! You already closed that file handle!” message to pop up on my screen and for my monitor to go dark and start flashing a skull and crossbones and for a calm woman’s voice to chant “ACCESS DENIED” over the intercom, but I would have settled for a non-zero exit code.
What did I get instead? Nothing.
Actually worse than nothing, because the end result of running this program is that
foo.txt gets truncated and replaced with the single character
a. Silently, without complaint. Insult to injury!
But, unfortunately for my sense of indignation, this is very much the documented behavior:
Once a semi-closed handle becomes closed, the contents of the associated list becomes fixed. The contents of this final list is only partially specified: it will contain at least all the items of the stream that were evaluated prior to the handle becoming closed.
So what’s happening here is this:
withFilegets a file handle and hands it to
hGetContentssays “Alright, I’ll create this empty list of characters, and if anyone asks what’s in it, I’ll load some up. But as soon as that handle is closed, I’m freezing the list.”
withFileimmediately closes the file handle.
hGetContentssays “Oh, well, the list is set in stone now. It shall be forever empty.”
I have no one to blame but myself, I suppose, bringing my own preconceived notions of what
hGetContents “should” do to the table. But the principle of least surprise might have something to say about this.
Once I understood the behavior, the fix was clear: I just need to force evaluation of the list-of-characters inside the function passed to
import System.IO main :: IO () main = do contents <- withFile "foo.txt" ReadMode strictRead withFile "foo.txt" WriteMode (flip hPutStr ('a':contents)) where strictRead handle = do str <- hGetContents handle seq str (return str)
And now everything’s fine. Or so I thought.
We’ll come back to the subtle (or glaring) bug in this code soon. But first…
Why didn’t the other thing work
Even though I thought I had it working, I still wanted to understand where my simpler approach went wrong:
main = do contents <- readFile "foo.txt" seq contents (return ()) writeFile "foo.txt" ('a':contents)
Because the documentation for
hGetContents is rather clear on one point:
A semi-closed handle becomes closed […] once the entire contents of the handle has been read.
Now, it doesn’t exactly say that it becomes closed immediately. But I was assuming that’s what it meant. And – as you can see in the
dtruss output up above – it certainly read the entire contents of the handle. All two bytes of it!
The historical record is a little fuzzy on what happened next. I believe I was talking to the friend and colleague who had gotten me interested in Haskell in the first place, and he handed me this very similar code for consideration:
main = do contents <- readFile "foo.txt" putStrLn contents writeFile "foo.txt" ('a':contents)
A minor variation on my attempt above, replacing the
seq expression with
And, surprisingly to me, this worked.
And that cracked the case wide open. Because
dtruss revealed a key difference between this implementation and the one that used
open("foo.txt\0", for reading, 0666) = opened as FD 3 read(FD 3, string address, up to 8096 bytes) = 2 bytes read: "b\n" write(stdout, "b\n\024\b\0", 2 bytes) = 2 bytes written read(FD 3, string address, up to 8096 bytes) = <-- I say! 0 bytes read close(FD 3) = closed successfully write(stdout, "\n\004\0", 1 byte) = 1 byte written open("foo.txt\0", for writing, 0666) = opened as FD 3 ftruncate(FD 3, 0x0, 0666) = truncated successfully write(FD 3, "ab\n\0", 3 bytes) = 3 bytes written close(FD 3) = closed successfully
Aha! A fascinating mistake.
Even though, in the
seq case, we were reading the entire contents of the file,
hGetContents has no way of knowing that. It asked for 8096 bytes, and it only got 2 back, but that doesn’t necessarily mean that there aren’t any more out there. From
man 2 read:
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
hGetContents has no way of knowing that we’re talking to a normal file here, so it needs to do the second
read in order to know that it is finished reading from the handle:
If successful, the number of bytes actually read is returned. Upon reading end-of-file, zero is returned. Otherwise, a
-1is returned and the global variable
errnois set to indicate the error.
And that’s exactly what we see in the
dtruss output above.
read has no way of saying “I read two bytes and there aren’t any more.” The second
read call is required to know that we’ve reached the end of the file.
And why didn’t
seq cause two
Because it didn’t need to.
seqing the truth
See, as you may already know, I was using
seq completely wrong.
In order to explain why, I would like to present the only definition of
seq you’ll ever need:
seq ⊥ x = ⊥ seq _ x = x
Or in English: if the first argument to
seq is bottom, then
seq returns bottom. Otherwise,
seq returns its second argument.
Which to me is far more intuitive than talking about “weak head normal form” or “evaluate until you find a data constructor or lambda abstraction or primitive value” or however I had seen
seq presented at the time.3
seq is, of course, lazy. It’s not going to do any more evaluation than it needs to in order to determine whether its first argument is
undefined or not. Returning to our example, even if
contents turns out to be
'b' : '\n' : undefined, that’s still distinct from
undefined, so it won’t bother to check if there’s any more to the file. No second
read, no handle closing, no joy.
A working solution
At last, I understood what was happening well enough to write a working solution:
main = do contents <- readFile "foo.txt" seq (length contents) (return ()) writeFile "foo.txt" ('a':contents)
The only way
seq can determine if
length returns bottom is to evaluate it, and the only way
length can determine how many characters are in the file is to read the whole thing.
I don’t feel great about that assertion, though: even though it’s true, it requires a little bit of indirect reasoning that shouldn’t really be in our application code. If I return to this code later on, will I still remember why that
seq is there?
So we could add a comment, or we could just require the
strict package and write:
import Prelude hiding (readFile) import System.IO.Strict (readFile) main = do contents <- readFile "foo.txt" writeFile "foo.txt" ('a':contents)
Which uses the same
length trick to fully evaluate the file contents, but keeps that detail out of our application code.
Or we could stop using the
String type to read from files at all, you monster, and use a library like
bytestring, which ship strict equivalents of the file-interacting functions in the
Prelude. This is the correct choice in real life, but was not the first thing I reached for in a little script that just
shows a data structure.
And, while we’re at it, we could also not overwrite files like this, for a thousand reasons, and instead write to a temporary file and
mv it over the old one once it has been successfully written. This is also the correct choice in real life, but if we’d done that then we never would have embarked on this fun journey!
Tying up loose ends
But we’re not totally in the clear yet.
Remember my first sanity check, where I thought I got it working by manually
hCloseing the file handle?
That only appeared to work because
foo.txt had fewer than 8096 characters in it. If it had been longer, I would have seen the same truncation behavior as in the
withFile example, just truncated to 8096 bytes instead of 0. Subtle! At least, subtle enough to fool me.
Now look back at my second attempt:
main = do contents <- readFile "foo.txt" seq contents (writeFile "foo.txt" ('a':contents))
Even if I replaced
seq contents (writeFile ...) with
seq (length contents) (writeFile ...), this would still be wrong, because
seq does not guarantee that it will evaluate its first argument before its second argument.
seq isn’t really about evaluation or order, despite the unfortunate name. It just provides a way to distinguish bottom from not-bottom.4
But would this work in practice? Sure! Sometimes.
Mendel Feygelson pointed out in the comments that there is a flaw in my reasoning here. Because it doesn’t matter if the
writeFileexpression is evaluated before the
length contentsexpression; what matters is that it’s not executed before
length contentsis evaluated. Evaluation ≠ execution in Haskell… except when lazy I/O is involved, and execution happens happen as a side-effect of evaluation, which is the point of this whole post and thus this is confusing to reason about.
The takeaway is that, in this case,
seq (length contents) (writeFile ...)is totally fine, regardless of the order in which those expressions are evaluated, because the actual
writeoperation won’t occur until the
writeFileexpression gets threaded into
IOoperation, which is guaranteed to be after
length contentsis evaluated.
Bring it back; bring it home
Returning to the Trello internationalization problem: I suspect I was seeing the error whenever I hit a template file that had no English strings in it, so it never had to consult or add to the bimap it was maintaining, which meant that it never had to evaluate the contents and never had to close the file handle it was reading from.
I don’t think I noticed this at the time (I don’t remember; it was over a year ago) because I was running it from a shell
for loop on a few hundred files at once, and didn’t bother to check which specific files it was failing on. Or maybe I did, and that’s why I thought it had to do with lazy evaluation… I don’t know. I shouldn’t have waited so long to write this up.
How do we feel about all of this
I’m glad I hit this bug. The experience really made me think deeply about evaluation and
seq and I/O and all sorts of things in Haskell. I had fun the entire time I was debugging it, which is one of the reasons why I wanted to share it with the rest of the world.
From a pedagogical standpoint, I can recognize that this isn’t great.
I’m not the first person to encounter this problem, and I won’t be the last.
My experience was, I hope, much worse than average: I had just enough misinformation to be dangerous, just enough false hypothesis confirmation to keep me looking in the wrong direction…
But still, how many people give up on Haskell because of things like this? Not because they somehow have a completely wrong model of how
seq behaves, but just because the three-line example at the beginning of this post fails in the first place. Even the jump from that example to “use the
strict package” requires figuring out how to use
cabal, and by then the Ruby equivalent is already finished running.
It’s not exactly novel to say that maybe lazy I/O isn’t all that great. And there are certainly arguments for keeping the
Prelude’s file interaction functions lazy by default. I don’t really want to get into that.
All I want is to present the unedited mistakes one beginner.
- Try it yourself! There’s a long prologue of non-app code that I assume is the Haskell runtime initialization. Also each
writeis paired with a
select, which I elided, there are some fun
ioctlcalls that fail because they’re not talking to a teletype, and more. [return]
- More on that later. [return]
- Writing this post, I learned that this is actually how it’s presented in the documentation, but I didn’t encounter this definition until I read Haskell, A History. I should really read more documentation. [return]
- There’s some “can’t tell your head normal form from your bottom” joke hiding in here, but I decided it wasn’t worth looking for. [return]