Coherent rendering in augmented reality deals with synthesizing virtual content that seamlessly blends in with the real content. Unfortunately, capturing or modeling every real aspect in the virtual rendering process is often unfeasible or too expensive. We present a post-processing method that improves the look of rendered overlays in a dental virtual try-on application. We combine the original frame and the default rendered frame in an autoencoder neural network in order to obtain a more natural output, inspired by artistic style transfer research. Specifically, we apply the original frame as style on the rendered frame as content, repeating the process with each new pair of frames. Our method requires only a single forward pass, our shallow architecture ensures fast execution, and our internal feedback loop inherently enforces temporal consistency.