OpenAIEmbeddings
임베딩 모델로는 "text-embedding-3-large"를 사용
이 모델은 OpenAI에서 제공하는 대규모 텍스트 임베딩 모델 중 하나
생성된 embeddings 객체는 텍스트를 벡터로 변환하는 데 사용
[결과]
[-0.03629460018406378, -0.007184663262302321, -0.03371515688153162, -0.028660489003748433, -0.02683663892458552, 0.03460102723929279, -0.012421715775472752, -0.007764386917723339, 0.0019410967294308348, -0.002639696151225509, 0.024739212823664366, -0.002437770038811742, -0.005784207816316093, -0.0029621265640420295, 0.00670915973114638, -0.0030207502966296384, 0.033871484972453327, -0.0015348017617673204, 0.021078487187073648, -0.008949888444724344, -0.021755916364982045, 0.010337317015461743, 0.006249940376128115, 0.007034846979190435, -0.01223933114008551,
0.002673893386775945, -0.025976825111289882, -0.00974456555346927, 0.0014957192733755812, ...]
쿼리 임베딩
embeddings.embed_query(text)는 주어진 텍스트를 임베딩 벡터로 변환하는 함수입니다.
text 매개변수로 전달된 텍스트를 입력으로 받음
텍스트를 임베딩 모델에 전달하여 해당 텍스트의 벡터 표현을 생성
생성된 임베딩 벡터를 query_result 변수에 저장
[결과]
[Document(page_content="Winston turned round abruptly. He had set his features into the expression of quiet optimism which it was advisable to wear when facing the telescreen. He crossed the room into the tiny kitchen. By leaving the Ministry at this time of day he had sacrificed his lunch in the canteen, and he was aware that there was no food in the kitchen except a hunk of dark-coloured bread which had got to be saved for tomorrow's breakfast. He took down from the shelf a bottle of colourless liquid with a plain white label marked VICTORY GIN. It gave off a sickly, oily smell, as of Chinese ricespirit. Winston poured out nearly a teacupful, nerved himself for a shock, and gulped it down like a dose of medicine.", metadata={'source': './files/chapter_one.docx'}), Document(page_content="Winston turned round abruptly. He had set his features into the expression of quiet optimism which it was advisable to wear when facing the telescreen. He crossed the room into the tiny kitchen. By leaving the Ministry at this time of day he had sacrificed his lunch in the canteen, and he was aware that there was no food in the kitchen except a hunk of dark-coloured bread which had got to be saved for tomorrow's breakfast. He took down from the shelf a bottle of colourless liquid with a plain white label marked VICTORY GIN. It gave off a sickly, oily smell, as of Chinese ricespirit. Winston poured out nearly a teacupful, nerved himself for a shock, and gulped it down like a dose of medicine.", metadata={'source': './files/chapter_one.docx'}), Document(page_content='Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste -- this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vistas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with cardboard and their roofs with corrugated iron, their crazy garden walls sagging in all directions? And the bombed sites where the plaster dust swirled in the air and the willow-herb straggled over the heaps of rubble; and the places where the bombs had cleared a larger patch and there had sprung up sordid colonies of wooden dwellings like chicken-houses? But it was no use, he could not remember: nothing remained of his childhood except a series of bright-lit tableaux occurring against no background and mostly unintelligible.', metadata={'source': './files/chapter_one.docx'}), Document(page_content='Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste -- this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vistas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with cardboard and their roofs with corrugated iron, their crazy garden walls sagging in all directions? And the bombed sites where the plaster dust swirled in the air and the willow-herb straggled over the heaps of rubble; and the places where the bombs had cleared a larger patch and there had sprung up sordid colonies of wooden dwellings like chicken-houses? But it was no use, he could not remember: nothing remained of his childhood except a series of bright-lit tableaux occurring against no background and mostly unintelligible.', metadata={'source': './files/chapter_one.docx'})]
cache_dir 에 캐시된 벡터 정보가 없을경우 OpenAI 에 질의하여 벡터정보를 가져오고 존재한다면 해당 캐시를 이용
CacheBackedEmbeddings
Embeddings는 재계산을 피하기 위해 저장되거나 일시적으로 캐시
Embeddings를 캐싱하는 것은 CacheBackedEmbeddings를 사용하여 수행
캐시 지원 embedder는 embeddings를 키-값 저장소에 캐싱하는 embedder 주변에 래퍼임
텍스트는 해시되고 이 해시는 캐시에서 키로 사용
CacheBackedEmbeddings를 초기화하는 주요 지원 방법은 from_bytes_store
매개변수
underlying_embedder: 임베딩을 위해 사용되는 embedder.
document_embedding_cache: 문서 임베딩을 캐싱하기 위한 ByteStore 중 하나.
namespace: (선택 사항, 기본값은 "") 문서 캐시를 위해 사용되는 네임스페이스. 이 네임스페이스는 다른 캐시와의 충돌을 피하기 위해 사용 예를 들어, 사용된 임베딩 모델의 이름으로 설정
Chroma
크로마는 오픈소스 벡터 데이터베이스입니다
'python' 카테고리의 다른 글
[GPT] Stuff LCEL Chain (0) | 2024.04.04 |
---|---|
[GPT] RetrievalQA (0) | 2024.04.04 |
[GPT] LangChain 한국어 튜터리얼 (0) | 2024.04.03 |
[GPT] Vectors (0) | 2024.04.03 |
[GPT] Tiktoken (0) | 2024.04.02 |