In recent years, OpenCL has been increasingly adopted as it enables software programmers to harness the performance and power efficiency of FPGAs. Despite simplifying the FPGA programming challenge, achieving high performance and energy efficiency with OpenCL is still a difficult task. In order to further contribute to the advance of the OpenCL usage for FPGAs, we utilize a realistic application scenario as our case study: the AutoDock molecular docking software. While OpenCL has proven its effectiveness in accelerating molecular docking on GPUs, for FPGA-based AutoDock accelerators it struggles with difficult design patterns. Besides complex multiple-producers to single-consumer datapaths, these include time-intensive loops with variable runtimes. Therefore, this work presents the design and optimization steps for implementing AutoDock in OpenCL targeting an Arria-10 FPGA, as well as a corresponding execution runtime and energy-efficiency evaluation. Applying these techniques improved the performance of the initial OpenCL implementation for FPGAs by three orders of magnitude, with the final version of the code now yielding speed-ups of up to ∼2.7x, and energy-efficiency gains of up to ∼1.8x over the original serial AutoDock version executing on a current-generation CPU.