python

[GPT][SITEGPT] AsyncChromiumLoader

으누아빠 2024. 5. 2. 12:55
반응형

 

 

from langchain.document_loaders import AsyncChromiumLoader
from langchain.document_transformers import Html2TextTransformer
import streamlit as st

st.set_page_config(
    page_title="SiteGPT",
    page_icon="🖥️",
)

html2text_transformer = Html2TextTransformer()

st.markdown(
    """
    # SiteGPT
           
    Ask questions about the content of a website.
           
    Start by writing the URL of the website on the sidebar.
"""
)


with st.sidebar:
    url = st.text_input(
        "Write down a URL",
        placeholder="https://example.com",
    )


if url:
    loader = AsyncChromiumLoader([url])
    docs = loader.load()
    transformed = html2text_transformer.transform_documents(docs)
    st.write(docs)    

 

AsyncChromiumLoader()

 

Chromium은 브라우저 자동화를 제어하는 데 사용되는 라이브러리인 플레이라이트(Playwright) 에서 지원하는 브라우저 중 하나
Head-less 인스턴스의 Chromium을 실행
Head-less 모드는 그래픽 사용자 인터페이스 없이 브라우저가 실행됨을 의미

AsyncChromiumLoader 는 페이지를 로드하고, 이후에 Html2TextTransformer 를 사용하여 텍스트로 변환