Hey all, I'm using Copilot Studio and have connected a SharePoint site as a knowledge source. The site hosts around 1800 PDF documents, and I've added the entire site as a single knowledge source for my agent.
The problem is that the agent frequently gives factually incorrect answers, even when the correct information clearly exists in the documents; it often "hallucinates" and/or mixes up details. I'm assuming it's trying to parse too much unstructured data at once.
What I did:
turned on orchestration (alongside generative AI) and turned off "Allow the AI to use its own general knowledge"
turned on enhanced search results (yes, I do have Microsoft 365 Copilot license in the same tenant as the agent)
added the content moderation to be medium (maybe should try with high as well now)
Possible next steps:
I saw in some documentation that SharePoint has a limit of a total of 200 files which can be included for each source, so maybe I should break my knowledge sources into 9 separate sources containing 200 PDF documents each. (docs: Unstructured data as a knowledge source - Microsoft Copilot Studio | Microsoft Learn). But, I wonder if this is relevant in my case since my understanding is that there are two types of SharePoint knowledge sources (SP URL, and direct documents from SP).
Wondered if anyone has any tips on what is the best way to move forward? I understand that most likely I will need to enhance the description of the knowledge source and maybe clean the source some more (some folders are unnecessary and can be removed). Would it help if I added some sort of semantic mapping in the general instructions? What I mean is that I explain where to look for (relative path) in the knowledge source in case a certain question is being asked.
Any tips or opinions I would greatly appreciate. :)
Romain The Low-Code...
99
Pablo Roldan
61
Michael E. Gernaey
34
Super User 2025 Season 1