Was reading through the original DALL-E paper (https://arxiv.org/pdf/2102.12092.pdf) and found their appendix A.3 pretty interesting. The motivation behind the logit Laplace loss totally makes sense, but if you plot the negative log of the PDF of their logit Laplace function, it’s not strictly positive as a function of x. Does that mean the authors are allowing for a potentially negative loss with this reformulation?
it’s just every other assumed prior (Gaussian, Laplacian) for the reconstruction error gives strictly nonnegative errors when the negative log is taken, so I expected this one to as well