Python Forum

Full Version: building a chatbot that fetches data from mongodb using RAG
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am developing a chatbot ,which fetches the required data from mongodb based on question user asks from chatbot ex: if user asks total number of employees,it shld return the number of employees stored in employee collection

what i have done

1.Load the employee collection from db into csv file.
2.Then i am using the transformers pipeline to get the answer as below

def mongo_to_csv(collection_name,input_text):
       # Connect to MongoDB

        client = MongoClient(mongo_uri)
        db = client[database_name]
        collection = db[collection_name]
        data = collection.find()

        # Extract all field names from the documents to handle varying fields
        all_keys = set()
        for document in data:
            all_keys.update(document.keys())

        # Re-query the collection since the previous iteration exhausted the cursor
        data = collection.find()

        # Open a CSV file to write
        with open(f"{collection_name}.csv", mode='a', newline='', encoding='utf-8') as csv_file:
            # Create a CSV DictWriter
            csv_writer = csv.DictWriter(csv_file, fieldnames=all_keys)
            # Write the header (field names)
            csv_writer.writeheader()

            # Write the data rows
            for document in data:
                csv_writer.writerow(document)

        print(
            f"Data from {collection_name} collection in {database_name} database has been written to {collection_name}.csv"
        )
        return responses_from_db(collection_name,input_text)


def responses_from_db(collection_name,input_text):
    tqa = pipeline(task="table-question-answering", model="google/tapas-base-finetuned-wtq")

    table = read_csv(f"{collection_name}.csv")
    table = pd.DataFrame.from_dict(table)

    table = table.astype(str)
    print(table)
    query = input_text
    print(tqa(table=table, query=query)['answer'])
    output=tqa(table=table, query=query)['answer']
    return output

mongo_to_csv("employee_collection","how many employees present ")
for a user input -"how many employees present" ,this is my output
Output:
"The interns details requested are - 25875, 30503, 49530"
i want the count but i am getting the employee id of all employees present(as only 3 entries present in my employee collection)

Also,some times i get the incomplete answer

for ex:if i ask "what is employeename who is working in python",i get only the first employee working in python and not all the employees working in python

my output is as below ,but there are other employees also working in python

Output:
"The details are - john"
can my code be modified to better suit my requirement

and,is there any other way or model through which better suits my requirement