As much as I wish it was as simple as that, the formatting, structure, training of your models etc is determined by what you are inputting. When you get inconsistent results, it could mean your instructions are off, or it could mean your model is off.
It would take more details to look at the inputs, the reasoning you are using and any... let's call them gotchas, where it may need to reflect corrections based on formatting, different document types, hierarchy conditions for discounts etc.
It would also, from a purely debugging and help perspective, require us to understand when it seems to work correctly and when it doesn't.
If a process fails, does it always fail and always fail the same way? Meaning if you feed in the set of documents/details does it always fail with the exact wrong discounts (example) or does it fail differently? Where-as when it works, does it always work with those same inputs and never fail?
Best practices are difficult because this whole thing is still being.... built which means we are still working on expanding models (which helps), expanding AI Input detection/comprehension etc, therefore placing more responsibility on us (as designers and users) to sometimes... fenagle it more than we would have to do in 1-2-5-10 years.
So in your case, it would require a review of what you have (even if everything was perfect), to see your prompt, see your inputs, see how you are using then, see your model training (if any) etc.
I wish I could just ramble off specifics and solve your issues, but its just not that simple right now.
If you have specific issues, we can work on that though.
If these suggestions help resolve your issue, Please consider Marking the answer as such and also maybe a like.
Thank you!
Sincerely, Michael Gernaey