The primary objective of this activity is to display a summarized response alongside the document source in the LangChain QA bot.
Within my input JSON data, there are three keys: page_name, page_data, and page_url. I aim to save the content under page_data in the page_content attribute of LangChain's Document class using jq package. Additionally, I intend to store page_name and page_url in the metadata. Could you guide me on how to accomplish this?
from langchain_community.document_loaders import JSONLoader ### data exported from MongoDB database json_data = [ {"_id": {"$oid": "65ed5d18b251090135c27d98" },"page_name": "Homepage","page_data": "Content about homepage","page_url": "https://mywebsite.com/homepage" }, {"_id": {"$oid": "65ed5d2fb251090135c27d99" },"page_name": "Contact US","page_data": "Content about Contact US","page_url": "https://mywebsite.com/contactus" } ] ### LangChain JSON loader Class loader = JSONLoader( file_path="/content/web_data.json", jq_schema='.[].page_data', text_content=False) docs = loader.load() print(docs) --> [Document(page_content="Content about homepage", metadata={'source': '/content/json_data.json', 'seq_num': 1}), Document(page_content="Content about Contact US", metadata={'source': '/content/json_data.json', 'seq_num': 2})]Final docs should look like this :
[Document(page_content="Content about homepage", metadata={'source': '/content/json_data.json', 'seq_num': 1, 'page_name': 'Homepage',"page_url": "https://mywebsite.com/homepage"}), Document(page_content="Content about Contact US", metadata={'source': '/content/json_data.json', 'seq_num': 2,"page_name": "Contact US","page_url": "https://mywebsite.com/contactus"})]