haskell - Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

Question

Welcome To Ask or Share your Answers For Others

haskell - Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

haskell - Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

As in my previous question, I'm trying to wrap the Data.Binary.Put monad into another monad so that later I can ask it questions like "how many bytes it's going to write" or "what is the current position in file".

Before, I thought that understanding why it leaks memory while using a trivial (IdentityT?) wrapper would lead me to solving my problem. But even though you guys have helped me resolve the problem with the trivial wrapper, wrapping it with something usefull like StateT or WriterT still consumes too much memory (and usually crashes).

For example, this is one way I'm trying to wrap it and which leaks memory for big input:

type Out = StateT Integer P.PutM ()

writeToFile :: String -> Out -> IO ()
writeToFile path out = BL.writeFile path $ P.runPut $ do runStateT out 0
                                                         return ()

Here is a more complete code sample that demonstrates the problem.

What I would like to know is this:

What is happending inside the program that causes the memory leak?
What can I do to fix it?

For my second question I think I should explain in more details what I intend the data to look on disk: It is basically a tree structure where each node of the tree is represented as an offset table to it's children (plus some additional data). So to calculate offset of n-th children into the offset table I need to know the sizes of children 0 to n-1 plus the current offset (to simplify things, let's say each node has fixed number of childs).

Thanks for looking.

UPDATE: Thanks to nominolo I can now create a monad that wraps around the Data.Binary.Put, tracks current offset and uses almost no memory. This is done by dropping the use of StateT transformer in favor of a different state threading mechanism that uses Continuations.

Like this:

type Offset = Int

newtype MyPut a = MyPut
  { unS :: forall r . (Offset -> a -> P.PutM r) -> Offset -> P.PutM r }

instance Monad MyPut where
  return a = MyPut $ f s -> f s a
  ma >>= f = MyPut $ fb s -> unS ma (s' a -> unS (f a) fb s') s

writeToFile :: String -> MyPut () -> IO ()
writeToFile path put =
  BL.writeFile path $ P.runPut $ peal put >> return ()
  where peal myput = unS myput (o -> return) 0

getCurrentOffset :: MyPut Int
getCurrentOffset = MyPut $ f o -> f o o

lift' n ma = MyPut $ f s -> ma >>= f (s+n)

However I still have a problem with tracking how many bytes is MyPut going to write on disk. In particular, I need to have a function with signature like this:

getSize :: MyPut a -> MyPut Int

or

getSize :: MyPut a -> Int

My aproach was to wrap the MyPut monad inside WriterT transformer (something like this). But that started to consume too much memory again. As sclv mentions in comments under nominolos answer, WriterT somehow cancels out the effect of continuations. He also mentions that getting the size should be possible directly from the MyPut monad that I already have, but all my attempts to do so ended in non compilable code or an infinite loop :-|.

Could someone please help further?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:38:05+0000

It looks like the monad transformer is too lazy. You can create a heap profile (without having to build it specially) by running the program with:

$ ./myprog +RTS -hT
$ hp2ps myprog.hp
$ open hp2ps.ps    # Or whichever viewer you have

In this case it's not particularly helpful, because it only shows lots of PAPs, FUN_1_0s and FUN_2_0s. This means the heap is made up of lots of partially applied functions, and functions of one argument and two arguments. This usually means that something is not evaluated enough. Monad transformers are somewhat notorious for this.

The workaround is to use a more strict monad transformers using continuation passing style. (his requires {-# LANGUAGE Rank2Types #-}.

newtype MyStateT s m a =
  MyStateT { unMyStateT :: forall r. (s -> a -> m r) -> s -> m r }

Continuation passing style means that instead of returning a result directly, we call another function, the continuation, with our result, in this case s and a. The instance definitions look a bit funny. To understand it read the link above (Wikipedia).

instance Monad m => Monad (MyStateT s m) where
  return x = MyStateT (k s -> k s x)
  MyStateT f >>= kk = MyStateT (k s ->
    f (s' a -> unMyStateT (kk a) k s') s)

runMyStateT :: Monad m => MyStateT s m a -> s -> m (a, s)
runMyStateT (MyStateT f) s0 = f (s a -> return (a, s)) s0

instance MonadTrans (MyStateT s) where
  lift act = MyStateT (k s -> do a <- act; k s a)

type Out = MyStateT Integer P.PutM ()

Running it now gives constant space (the "maximum residency" bit):

$ ./so1 +RTS -s 
begin
end
   8,001,343,308 bytes allocated in the heap
     877,696,096 bytes copied during GC
          46,628 bytes maximum residency (861 sample(s))
          33,196 bytes maximum slop
            2 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 14345 collections,     0 parallel,  3.32s,  3.38s elapsed
Generation 1:   861 collections,     0 parallel,  0.08s,  0.08s elapsed

The downside of using such strict transformers is that you can no longer define MonadFix instances and certain laziness tricks no longer work.

Categories

haskell - Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

haskell - Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags