Have you tried using the GraphRAG approach of just rerunning the same prompts multiple times and then giving the results along with a prompt to the model telling it to extract the true text and fix any mistakes? With mini this seems like a very workable solution. You could even incorporate one or more attempts from whatever OCR you were using previously.
I think that is one of the key findings from GraphRAG paper: the gpt can replace the human in the loop.
I think that is one of the key findings from GraphRAG paper: the gpt can replace the human in the loop.