[GPT][QUIZGPT]Output Parser 를 이용한 데이터 형태 제어

python

[GPT][QUIZGPT]Output Parser 를 이용한 데이터 형태 제어

으누아빠 2024. 4. 18. 19:15

이전 페이지에서 Formatter Prompt 를 이용하여 원하는 형태로 만든것을 output parser 를 이용하는 형태로 변환

import json

from operator import rshift

from langchain.document_loaders import UnstructuredFileLoader

from langchain.text_splitter import CharacterTextSplitter

from langchain.chat_models import ChatOpenAI

from langchain.prompts import ChatPromptTemplate

from langchain.callbacks import StreamingStdOutCallbackHandler

import streamlit as st

from langchain.retrievers import WikipediaRetriever

from langchain.schema import BaseOutputParser, output_parser

class JsonOutputParser(BaseOutputParser):

def parse(self, text):

text = text.replace("```", "").replace("json", "")

return json.loads(text)

output_parser = JsonOutputParser()

st.set_page_config(

page_title="QuizGPT",

page_icon="❓",

)

st.title("QuizGPT")

llm = ChatOpenAI(

temperature=0.1,

model="gpt-3.5-turbo-1106",

streaming=True,

callbacks=[StreamingStdOutCallbackHandler()],

)

def format_docs(docs):

return "\n\n".join(document.page_content for document in docs)

questions_prompt = ChatPromptTemplate.from_messages(

[

(

"system",

"""

You are a helpful assistant that is role playing as a teacher.

Based ONLY on the following context make 10 questions to test the user's knowledge about the text.

Each question should have 4 answers, three of them must be incorrect and one should be correct.

Use (o) to signal the correct answer.

Question examples:

Question: What is the color of the ocean?

Answers: Red|Yellow|Green|Blue(o)

Question: What is the capital or Georgia?

Answers: Baku|Tbilisi(o)|Manila|Beirut

Question: When was Avatar released?

Answers: 2007|2001|2009(o)|1998

Question: Who was Julius Caesar?

Answers: A Roman Emperor(o)|Painter|Actor|Model

Your turn!

Context: {context}

""",

)

]

)

questions_chain = {"context": format_docs} | questions_prompt | llm

formatting_prompt = ChatPromptTemplate.from_messages(

[

(

"system",

"""

You are a powerful formatting algorithm.

You format exam questions into JSON format.

Answers with (o) are the correct ones.

Example Input:

Question: What is the color of the ocean?

Answers: Red|Yellow|Green|Blue(o)

Question: What is the capital or Georgia?

Answers: Baku|Tbilisi(o)|Manila|Beirut

Question: When was Avatar released?

Answers: 2007|2001|2009(o)|1998

Question: Who was Julius Caesar?

Answers: A Roman Emperor(o)|Painter|Actor|Model

Example Output:

```json

{{ "questions": [

{{

"question": "What is the color of the ocean?",

"answers": [

{{

"answer": "Red",

"correct": false

}},

{{

"answer": "Yellow",

"correct": false

}},

{{

"answer": "Green",

"correct": false

}},

{{

"answer": "Blue",

"correct": true

}},

]

}},

{{

"question": "What is the capital or Georgia?",

"answers": [

{{

"answer": "Baku",

"correct": false

}},

{{

"answer": "Tbilisi",

"correct": true

}},

{{

"answer": "Manila",

"correct": false

}},

{{

"answer": "Beirut",

"correct": false

}},

]

}},

{{

"question": "When was Avatar released?",

"answers": [

{{

"answer": "2007",

"correct": false

}},

{{

"answer": "2001",

"correct": false

}},

{{

"answer": "2009",

"correct": true

}},

{{

"answer": "1998",

"correct": false

}},

]

}},

{{

"question": "Who was Julius Caesar?",

"answers": [

{{

"answer": "A Roman Emperor",

"correct": true

}},

{{

"answer": "Painter",

"correct": false

}},

{{

"answer": "Actor",

"correct": false

}},

{{

"answer": "Model",

"correct": false

}},

]

}}

]

}}

```

Your turn!

Questions: {context}

""",

)

]

)

formatting_chain = formatting_prompt | llm

@st.cache_data(show_spinner="Loading file...")

def split_file(file):

file_content = file.read()

file_path = f"./.cache/quiz_files/{file.name}"

with open(file_path, "wb") as f:

f.write(file_content)

splitter = CharacterTextSplitter.from_tiktoken_encoder(

separator="\n",

chunk_size=600,

chunk_overlap=100,

)

loader = UnstructuredFileLoader(file_path)

docs = loader.load_and_split(text_splitter=splitter)

return docs

with st.sidebar:

docs = None

choice = st.selectbox(

"Choose what you want to use.",

(

"File",

"Wikipedia Article",

)

if choice == "File":

file = st.file_uploader(

"Upload a .docx , .txt or .pdf file",

type=["pdf", "txt", "docx"],

)

if file:

docs = split_file(file)

else:

topic = st.text_input("Search Wikipedia...")

if topic:

retriever = WikipediaRetriever(top_k_results=5)

with st.status("Searching Wikipedia..."):

docs = retriever.get_relevant_documents(topic)

if not docs:

st.markdown(

"""

Welcome to QuizGPT.

I will make a quiz from Wikipedia articles or files you upload to test your knowledge and help you study.

Get started by uploading a file or searching on Wikipedia in the sidebar.

"""

)

else:

start = st.button("Generate Quiz")

if start:

chain = {"context": questions_chain} | formatting_chain | output_parser

response = chain.invoke(docs)

st.write(response)

'python' 카테고리의 다른 글

[GPT][SITEGPT] SitemapLoader (0)	2024.05.13
[GPT][SITEGPT] AsyncChromiumLoader (0)	2024.05.02
[GPT][QUIZGPT]Formatter Prompt (0)	2024.04.18
[GPT][QUIZGPT]Questions Prompt (0)	2024.04.18
[GPT][QUIZGPT]WikipediaRetriever (0)	2024.04.18

현재글[GPT][QUIZGPT]Output Parser 를 이용한 데이터 형태 제어

250x250

javascript, RAG, 진보, 시사, 유튜브, 진도, ES6, 중도, Flutter, IOS, 중보, chain, ConversationSummaryBufferMemory, MessagesPlaceholder, LCEL, ChatPromptTemplate, 배포, 안드로이드, 보수, mongodb,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

OUR + YOUR SPACE