@avi I'm looking into ADAM but all descriptions I could find assume familiarity with stochastic gradient descent. And when I'm looking into that, I can't figure out how to map the terminology to what I'm doing, or if it even applies. In the image here (from Wikipedia) I don't know what the summand Qi functions are supposed to correspond to. I have many parameters/dimensions (104) but I only have one evaluation function. I also don't know if I have anything corresponding to "i-th observation".