So you wrote this
I am creating a Chatbot that will help answer questions related to Standard Operating Procedures.
There are over 6000 SOP word & PDf documents.
--I am guessing you are storing these in SharePoint? Is your knowledge source(s) setup at the as the root? How many levels of documents do you have? How many documents in a given document library and or subfolder do you have?
--Firstly there is a limit to the # of documents that an Agent will index. It will also limit the amount of data, based on size and complexity of your documents. My point is, an Agent by itself is not really intended to do this many. Training an LLM or extending a Model is another story, but simply pointing at 6,000 goes beyond its limits. In many cases you will need to partition your data into subfolders etc and then put the documents in there specifically and make them knowledge sources versus having it try to use SharePoint at (any level) try to find it deeper down.
I am running into interesting issue of how to resolve Ambiguity.
1. It find answers from irrelevant documents.
There are Multiple SOP that would have information around What PPE(Personal Protective Equipment) a Machine Operator should wear when troubleshooting a specific Machine.
It will randomly find answer that from different documents.
--This would be expected due to the volume of data you have. The issue is, think about it like this. You have a zillion books in a room, the person who is trying to find the answer (let's call them a librarian), they have no idea which books have the information you want because they cannot summarize (index) them all. so they try and find something and when they do... they go oh this is it.
How can make Chatbot to ask some more questions to user before answering a question where the information is found in multiple documents and use that new information to only search specific document.
So, for question like
User: What are the PPE requirements ?
Chatbot: Can you please tell me more about which Machine & Assembly line are you talking about ?
User : MAchine A on Assembly line 1.
<Out of 6000 Share Point document it should only search the documents that are relevant for Machine A and assembly Line 1>
How do achieve that ?
--So to mix up the answer a little bit (from the above questions), essentially you need to use Scoring. In the scoring, if the agent scores the answers too low, then you would want it to ask more questions. This is harder to do, not impossible but its not an OOB agent thing exactly, its a Prompt Against a real model.
If you want it to only search documents that are related to Machin A, then you have to separate your data in a way where the knowledge sources can be Indexed properly against the topics (and by topic I do not mean Agent topics) but the focus area they are asking about. It is never going to be "just smart" enough to take your thousands of documents dropped somewhere and OOB do that.
Also i only want Chatbot to ask user if he has not already provided that information. If he has already provided that information, Chat bot should not ask for it again.
-----Conversations are only so large, no matter whether your Topic ends a conversation or not, an Agent will no automatically remember everything forever. It will reset at some point. Which means you would have to leverage global variables to track everything and even then, they can be wiped over time, simply meaning there is no gauranteed way (in memory) to make it remember it all. There are ways to cache it, write it to a file etc, and read it only the fly etc. But its not as simple as it would be if it was going against an actual model, versus Knowledge sources.