Dynamic graph survey#11019
Conversation
doc/survey/dynamic_graph.md
Outdated
There was a problem hiding this comment.
It seems a misconception of "tape" here. Tape, according to this well-known paper in the AD field, is a "log stream" generated by running a program
Derivatives are typically computed by recording a ‘tape’ of the computation and interpreting (or run-time compiling) a transformation of the tape played back in reverse.
This tape is a different kind of entity than the original program.
but not a "representation" of the program.
I did noticed that in this well-know tutorial, it says
The nodes themselves are stored in a common array (Vec) that is shared by the entire expression graph, which also acts as the allocation arena. In AD literature, this shared array is often called a tape (or Wengert list).
This saying is very misleading, as it follows:
The tape can be thought of as a record of all the operations performed during the evaluation of the expression, which in turn contains all the information required to compute its gradient when read in reverse.)
The problem is -- the tape IS a record, but not "can be thought of".
aee3708 to
b258c12
Compare
kexinzhao
left a comment
There was a problem hiding this comment.
LGTM after offline discussion with @tonyyang-svail.
Survey on Differentiable Programming through Dynamic Graph. Discussions are welcomed.