From 53752a4682ef26c1d4d46bc6f3dfc893585960ac Mon Sep 17 00:00:00 2001 From: Noah Jefferson Baumann <noah.jefferson.baumann.1@hu-berlin.de> Date: Mon, 24 Feb 2025 18:09:06 +0100 Subject: [PATCH] =?UTF-8?q?improved=20Einf=C3=BChrung?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../notebooks/00_Einf\303\274hruhg_RAG.ipynb" | 831 ----------- .../notebooks/00_Einf\303\274hrung_RAG.ipynb" | 1212 +++++++++++++++++ 2 files changed, 1212 insertions(+), 831 deletions(-) delete mode 100644 "rag_lb/jupyter-book/notebooks/00_Einf\303\274hruhg_RAG.ipynb" create mode 100644 "rag_lb/jupyter-book/notebooks/00_Einf\303\274hrung_RAG.ipynb" diff --git "a/rag_lb/jupyter-book/notebooks/00_Einf\303\274hruhg_RAG.ipynb" "b/rag_lb/jupyter-book/notebooks/00_Einf\303\274hruhg_RAG.ipynb" deleted file mode 100644 index 5d3cbe8..0000000 --- "a/rag_lb/jupyter-book/notebooks/00_Einf\303\274hruhg_RAG.ipynb" +++ /dev/null @@ -1,831 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Retrieval Augmented Generation (RAG) für historische Textanalyse\n", - "## Eine Einführung anhand deutscher Soldatenbriefe (1745-1872)\n", - "\n", - "### 1. Einleitung und Überblick\n", - "\n", - "#### 1.1 Was erwartet Sie in diesem Jupyter Book?\n", - "Dieses Jupyter Book führt Sie in die Methode des Retrieval Augmented Generation (RAG) ein und zeigt deren Anwendung in der historischen Textanalyse. RAG kombiniert die Fähigkeiten von Sprachmodellen mit gezielter Informationssuche in spezifischen Dokumenten - in unserem Fall historischen Quellen.\n", - "\n", - "#### 1.2 Lernziele\n", - "Nach diesem einführenden Notebook werden Sie:\n", - "- Die Grundidee von RAG verstehen\n", - "- Den Wert dieser Methode für die historische Forschung einschätzen können\n", - "- Eine erste RAG-Pipeline in Aktion gesehen haben\n", - "- Die Möglichkeiten und Grenzen der Methode kennen\n", - "\n", - "#### 1.3 Aufbau des Jupyter Books\n", - "Das Book ist modular aufgebaut und umfasst:\n", - "1. **Einführung** (dieses Notebook)\n", - " - Überblick und erste Demonstration\n", - "2. **Datenaufbereitung**\n", - " - Textverarbeitung\n", - " - Chunking-Strategien\n", - "3. **Embedding und Retrieval**\n", - " - Vektorisierung von Text\n", - " - Ähnlichkeitssuche\n", - "4. **LLM-Integration**\n", - " - Prompt-Entwicklung\n", - " - Antwortgenerierung\n", - "5. **Anwendungsfälle**\n", - " - Spezifische historische Analysen\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2. Unser Quellenkorpus: Soldatenbriefe 1745-1872\n", - "\n", - "#### 2.1 Beschreibung der Quelle\n", - "Das Korpus \"Soldatenbriefe\" ist eine Sammlung von 170 Briefen aus dem deutschsprachigen Raum, die mehrere wichtige historische Epochen umfasst:\n", - "- Koalitions- und Befreiungskriege (1792−1815)\n", - "- Deutscher Krieg (1866)\n", - "- Deutsch-Französischer Krieg (1870/71)\n", - "\n", - "Besonders wertvoll für die historische Forschung sind:\n", - "- Die soziale Bandbreite (Offiziere bis einfache Soldaten)\n", - "- Die zeitliche Tiefe (über 125 Jahre)\n", - "- Die persönliche Perspektive auf historische Ereignisse\n", - "\n", - "#### 2.2 Herkunft und Rechtliches\n", - "- Quelle: Deutsches Textarchiv (DTA)\n", - "- Lizenz: CC BY-SA 4.0\n", - "- GitHub: [deutschestextarchiv/soldatenbriefe](https://github.com/deutschestextarchiv/soldatenbriefe)\n", - "- Publikation: Marko Neumann (2019): \"Soldatenbriefe des 18. und 19. Jahrhunderts\"\n", - "\n", - "#### 2.3 Datenformate und Struktur\n", - "Die Briefe liegen in verschiedenen Formaten vor:\n", - "- **TEI-XML**: Ursprungsformat mit detaillierten Metadaten\n", - "- **CSV**: Aufbereitete Version für die Analyse\n", - "- **Plain Text**: Extrahierte Brieftexte" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 3. Erste Schritte mit RAG\n", - "\n", - "#### 3.1 Technische Voraussetzungen\n", - "Für dieses Notebook benötigen wir einige Python-Bibliotheken:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting langchainNote: you may need to restart the kernel to use updated packages.\n", - "\n", - " Downloading langchain-0.3.19-py3-none-any.whl.metadata (7.9 kB)\n", - "Collecting langchain-community\n", - " Downloading langchain_community-0.3.18-py3-none-any.whl.metadata (2.4 kB)\n", - "Collecting langchain_openai\n", - " Downloading langchain_openai-0.3.6-py3-none-any.whl.metadata (2.3 kB)\n", - "Collecting sentence_transformers\n", - " Downloading sentence_transformers-3.4.1-py3-none-any.whl.metadata (10 kB)\n", - "Collecting huggingface-hub\n", - " Downloading huggingface_hub-0.29.1-py3-none-any.whl.metadata (13 kB)\n", - "Collecting chromadb\n", - " Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)\n", - "Collecting langchain_huggingface\n", - " Using cached langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)\n", - "Requirement already satisfied: pandas in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (2.2.3)\n", - "Collecting langchain-core<1.0.0,>=0.3.35 (from langchain)\n", - " Downloading langchain_core-0.3.37-py3-none-any.whl.metadata (5.9 kB)\n", - "Collecting langchain-text-splitters<1.0.0,>=0.3.6 (from langchain)\n", - " Downloading langchain_text_splitters-0.3.6-py3-none-any.whl.metadata (1.9 kB)\n", - "Collecting langsmith<0.4,>=0.1.17 (from langchain)\n", - " Downloading langsmith-0.3.10-py3-none-any.whl.metadata (14 kB)\n", - "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.10.5)\n", - "Requirement already satisfied: SQLAlchemy<3,>=1.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.0.37)\n", - "Requirement already satisfied: requests<3,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.32.3)\n", - "Requirement already satisfied: PyYAML>=5.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (6.0.2)\n", - "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (3.11.11)\n", - "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (9.0.0)\n", - "Requirement already satisfied: numpy<2,>=1.26.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (1.26.4)\n", - "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (0.6.7)\n", - "Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)\n", - " Downloading pydantic_settings-2.8.0-py3-none-any.whl.metadata (3.5 kB)\n", - "Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)\n", - " Using cached httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)\n", - "Requirement already satisfied: openai<2.0.0,>=1.58.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (1.60.0)\n", - "Requirement already satisfied: tiktoken<1,>=0.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (0.8.0)\n", - "Collecting transformers<5.0.0,>=4.41.0 (from sentence_transformers)\n", - " Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)\n", - "Requirement already satisfied: tqdm in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (4.67.1)\n", - "Requirement already satisfied: torch>=1.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (2.5.1)\n", - "Requirement already satisfied: scikit-learn in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.6.1)\n", - "Requirement already satisfied: scipy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.12.0)\n", - "Requirement already satisfied: Pillow in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (11.0.0)\n", - "Requirement already satisfied: filelock in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (3.16.1)\n", - "Requirement already satisfied: fsspec>=2023.5.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (2024.12.0)\n", - "Requirement already satisfied: packaging>=20.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (24.2)\n", - "Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (4.12.2)\n", - "Collecting build>=1.0.3 (from chromadb)\n", - " Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)\n", - "Collecting chroma-hnswlib==0.7.6 (from chromadb)\n", - " Downloading chroma_hnswlib-0.7.6-cp311-cp311-win_amd64.whl.metadata (262 bytes)\n", - "Collecting fastapi>=0.95.2 (from chromadb)\n", - " Downloading fastapi-0.115.8-py3-none-any.whl.metadata (27 kB)\n", - "Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)\n", - " Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)\n", - "Collecting posthog>=2.4.0 (from chromadb)\n", - " Downloading posthog-3.15.1-py2.py3-none-any.whl.metadata (2.9 kB)\n", - "Collecting onnxruntime>=1.14.1 (from chromadb)\n", - " Downloading onnxruntime-1.20.1-cp311-cp311-win_amd64.whl.metadata (4.7 kB)\n", - "Collecting opentelemetry-api>=1.2.0 (from chromadb)\n", - " Downloading opentelemetry_api-1.30.0-py3-none-any.whl.metadata (1.6 kB)\n", - "Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)\n", - " Downloading opentelemetry_exporter_otlp_proto_grpc-1.30.0-py3-none-any.whl.metadata (2.4 kB)\n", - "Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb)\n", - " Downloading opentelemetry_instrumentation_fastapi-0.51b0-py3-none-any.whl.metadata (2.2 kB)\n", - "Collecting opentelemetry-sdk>=1.2.0 (from chromadb)\n", - " Downloading opentelemetry_sdk-1.30.0-py3-none-any.whl.metadata (1.6 kB)\n", - "Collecting tokenizers>=0.13.2 (from chromadb)\n", - " Using cached tokenizers-0.21.0-cp39-abi3-win_amd64.whl.metadata (6.9 kB)\n", - "Collecting pypika>=0.48.9 (from chromadb)\n", - " Downloading PyPika-0.48.9.tar.gz (67 kB)\n", - " Installing build dependencies: started\n", - " Installing build dependencies: finished with status 'done'\n", - " Getting requirements to build wheel: started\n", - " Getting requirements to build wheel: finished with status 'done'\n", - " Preparing metadata (pyproject.toml): started\n", - " Preparing metadata (pyproject.toml): finished with status 'done'\n", - "Collecting overrides>=7.3.1 (from chromadb)\n", - " Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)\n", - "Collecting importlib-resources (from chromadb)\n", - " Downloading importlib_resources-6.5.2-py3-none-any.whl.metadata (3.9 kB)\n", - "Collecting grpcio>=1.58.0 (from chromadb)\n", - " Downloading grpcio-1.70.0-cp311-cp311-win_amd64.whl.metadata (4.0 kB)\n", - "Collecting bcrypt>=4.0.1 (from chromadb)\n", - " Downloading bcrypt-4.2.1-cp39-abi3-win_amd64.whl.metadata (10 kB)\n", - "Requirement already satisfied: typer>=0.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.15.1)\n", - "Collecting kubernetes>=28.1.0 (from chromadb)\n", - " Downloading kubernetes-32.0.1-py2.py3-none-any.whl.metadata (1.7 kB)\n", - "Collecting mmh3>=4.0.1 (from chromadb)\n", - " Downloading mmh3-5.1.0-cp311-cp311-win_amd64.whl.metadata (16 kB)\n", - "Collecting orjson>=3.9.12 (from chromadb)\n", - " Downloading orjson-3.10.15-cp311-cp311-win_amd64.whl.metadata (42 kB)\n", - "Requirement already satisfied: httpx>=0.27.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.28.1)\n", - "Requirement already satisfied: rich>=10.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (13.9.4)\n", - "Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2.9.0.post0)\n", - "Requirement already satisfied: pytz>=2020.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2024.2)\n", - "Requirement already satisfied: tzdata>=2022.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2024.2)\n", - "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.4.4)\n", - "Requirement already satisfied: aiosignal>=1.1.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.2)\n", - "Requirement already satisfied: attrs>=17.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (24.3.0)\n", - "Requirement already satisfied: frozenlist>=1.1.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.5.0)\n", - "Requirement already satisfied: multidict<7.0,>=4.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.1.0)\n", - "Requirement already satisfied: propcache>=0.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (0.2.1)\n", - "Requirement already satisfied: yarl<2.0,>=1.17.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.18.3)\n", - "Collecting pyproject_hooks (from build>=1.0.3->chromadb)\n", - " Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB)\n", - "Requirement already satisfied: colorama in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from build>=1.0.3->chromadb) (0.4.6)\n", - "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.0)\n", - "Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n", - "Collecting starlette<0.46.0,>=0.40.0 (from fastapi>=0.95.2->chromadb)\n", - " Downloading starlette-0.45.3-py3-none-any.whl.metadata (6.3 kB)\n", - "Requirement already satisfied: anyio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (4.8.0)\n", - "Requirement already satisfied: certifi in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (2024.12.14)\n", - "Requirement already satisfied: httpcore==1.* in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (1.0.7)\n", - "Requirement already satisfied: idna in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (3.10)\n", - "Requirement already satisfied: h11<0.15,>=0.13 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpcore==1.*->httpx>=0.27.0->chromadb) (0.14.0)\n", - "Requirement already satisfied: six>=1.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (1.17.0)\n", - "Collecting google-auth>=1.0.1 (from kubernetes>=28.1.0->chromadb)\n", - " Downloading google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)\n", - "Collecting websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 (from kubernetes>=28.1.0->chromadb)\n", - " Downloading websocket_client-1.8.0-py3-none-any.whl.metadata (8.0 kB)\n", - "Collecting requests-oauthlib (from kubernetes>=28.1.0->chromadb)\n", - " Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl.metadata (11 kB)\n", - "Collecting oauthlib>=3.2.2 (from kubernetes>=28.1.0->chromadb)\n", - " Downloading oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)\n", - "Requirement already satisfied: urllib3>=1.24.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.2.3)\n", - "Collecting durationpy>=0.7 (from kubernetes>=28.1.0->chromadb)\n", - " Downloading durationpy-0.9-py3-none-any.whl.metadata (338 bytes)\n", - "Collecting jsonpatch<2.0,>=1.33 (from langchain-core<1.0.0,>=0.3.35->langchain)\n", - " Using cached jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)\n", - "Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith<0.4,>=0.1.17->langchain)\n", - " Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)\n", - "Collecting zstandard<0.24.0,>=0.23.0 (from langsmith<0.4,>=0.1.17->langchain)\n", - " Downloading zstandard-0.23.0-cp311-cp311-win_amd64.whl.metadata (3.0 kB)\n", - "Collecting coloredlogs (from onnxruntime>=1.14.1->chromadb)\n", - " Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\n", - "Collecting flatbuffers (from onnxruntime>=1.14.1->chromadb)\n", - " Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)\n", - "Requirement already satisfied: protobuf in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (5.29.3)\n", - "Requirement already satisfied: sympy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (1.13.1)\n", - "Requirement already satisfied: distro<2,>=1.7.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.9.0)\n", - "Requirement already satisfied: jiter<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (0.8.2)\n", - "Requirement already satisfied: sniffio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.3.1)\n", - "Requirement already satisfied: deprecated>=1.2.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-api>=1.2.0->chromadb) (1.2.16)\n", - "Collecting importlib-metadata<=8.5.0,>=6.0 (from opentelemetry-api>=1.2.0->chromadb)\n", - " Downloading importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)\n", - "Collecting googleapis-common-protos~=1.52 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb)\n", - " Downloading googleapis_common_protos-1.68.0-py2.py3-none-any.whl.metadata (5.1 kB)\n", - "Collecting opentelemetry-exporter-otlp-proto-common==1.30.0 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb)\n", - " Downloading opentelemetry_exporter_otlp_proto_common-1.30.0-py3-none-any.whl.metadata (1.9 kB)\n", - "Collecting opentelemetry-proto==1.30.0 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb)\n", - " Downloading opentelemetry_proto-1.30.0-py3-none-any.whl.metadata (2.4 kB)\n", - "Collecting opentelemetry-instrumentation-asgi==0.51b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)\n", - " Downloading opentelemetry_instrumentation_asgi-0.51b0-py3-none-any.whl.metadata (2.1 kB)\n", - "Collecting opentelemetry-instrumentation==0.51b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)\n", - " Downloading opentelemetry_instrumentation-0.51b0-py3-none-any.whl.metadata (6.3 kB)\n", - "Collecting opentelemetry-semantic-conventions==0.51b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)\n", - " Downloading opentelemetry_semantic_conventions-0.51b0-py3-none-any.whl.metadata (2.5 kB)\n", - "Collecting opentelemetry-util-http==0.51b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)\n", - " Downloading opentelemetry_util_http-0.51b0-py3-none-any.whl.metadata (2.6 kB)\n", - "Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (1.17.2)\n", - "Collecting asgiref~=3.0 (from opentelemetry-instrumentation-asgi==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)\n", - " Downloading asgiref-3.8.1-py3-none-any.whl.metadata (9.3 kB)\n", - "Requirement already satisfied: monotonic>=1.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from posthog>=2.4.0->chromadb) (1.6)\n", - "Collecting backoff>=1.10.0 (from posthog>=2.4.0->chromadb)\n", - " Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)\n", - "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n", - "Requirement already satisfied: pydantic-core==2.27.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.27.2)\n", - "Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)\n", - " Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from requests<3,>=2->langchain) (3.4.0)\n", - "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (3.0.0)\n", - "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (2.18.0)\n", - "Requirement already satisfied: greenlet!=0.4.17 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from SQLAlchemy<3,>=1.4->langchain) (3.1.1)\n", - "Requirement already satisfied: regex>=2022.1.18 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from tiktoken<1,>=0.7->langchain_openai) (2024.11.6)\n", - "Requirement already satisfied: networkx in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.4.2)\n", - "Requirement already satisfied: jinja2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.1.5)\n", - "Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n", - "Collecting safetensors>=0.4.1 (from transformers<5.0.0,>=4.41.0->sentence_transformers)\n", - " Downloading safetensors-0.5.2-cp38-abi3-win_amd64.whl.metadata (3.9 kB)\n", - "Requirement already satisfied: click>=8.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (8.1.8)\n", - "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (1.5.4)\n", - "Collecting httptools>=0.6.3 (from uvicorn[standard]>=0.18.3->chromadb)\n", - " Downloading httptools-0.6.4-cp311-cp311-win_amd64.whl.metadata (3.7 kB)\n", - "Collecting watchfiles>=0.13 (from uvicorn[standard]>=0.18.3->chromadb)\n", - " Downloading watchfiles-1.0.4-cp311-cp311-win_amd64.whl.metadata (5.0 kB)\n", - "Collecting websockets>=10.4 (from uvicorn[standard]>=0.18.3->chromadb)\n", - " Downloading websockets-15.0-cp311-cp311-win_amd64.whl.metadata (7.0 kB)\n", - "Requirement already satisfied: joblib>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (1.4.2)\n", - "Requirement already satisfied: threadpoolctl>=3.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (3.5.0)\n", - "Collecting cachetools<6.0,>=2.0.0 (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb)\n", - " Downloading cachetools-5.5.2-py3-none-any.whl.metadata (5.4 kB)\n", - "Collecting pyasn1-modules>=0.2.1 (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb)\n", - " Downloading pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)\n", - "Collecting rsa<5,>=3.1.4 (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb)\n", - " Downloading rsa-4.9-py3-none-any.whl.metadata (4.2 kB)\n", - "Collecting zipp>=3.20 (from importlib-metadata<=8.5.0,>=6.0->opentelemetry-api>=1.2.0->chromadb)\n", - " Downloading zipp-3.21.0-py3-none-any.whl.metadata (3.7 kB)\n", - "Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.35->langchain)\n", - " Using cached jsonpointer-3.0.0-py2.py3-none-any.whl.metadata (2.3 kB)\n", - "Requirement already satisfied: mdurl~=0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb) (0.1.2)\n", - "Requirement already satisfied: mypy-extensions>=0.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.0.0)\n", - "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime>=1.14.1->chromadb)\n", - " Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\n", - "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from jinja2->torch>=1.11.0->sentence_transformers) (3.0.2)\n", - "Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime>=1.14.1->chromadb)\n", - " Downloading pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)\n", - "Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb)\n", - " Downloading pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)\n", - "Downloading langchain-0.3.19-py3-none-any.whl (1.0 MB)\n", - " ---------------------------------------- 0.0/1.0 MB ? eta -:--:--\n", - " ---------------------------------------- 1.0/1.0 MB 9.6 MB/s eta 0:00:00\n", - "Downloading langchain_community-0.3.18-py3-none-any.whl (2.5 MB)\n", - " ---------------------------------------- 0.0/2.5 MB ? eta -:--:--\n", - " ---------------------------------------- 2.5/2.5 MB 18.1 MB/s eta 0:00:00\n", - "Downloading langchain_openai-0.3.6-py3-none-any.whl (54 kB)\n", - "Downloading sentence_transformers-3.4.1-py3-none-any.whl (275 kB)\n", - "Downloading huggingface_hub-0.29.1-py3-none-any.whl (468 kB)\n", - "Downloading chromadb-0.6.3-py3-none-any.whl (611 kB)\n", - " ---------------------------------------- 0.0/611.1 kB ? eta -:--:--\n", - " ---------------------------------------- 611.1/611.1 kB 5.5 MB/s eta 0:00:00\n", - "Downloading chroma_hnswlib-0.7.6-cp311-cp311-win_amd64.whl (151 kB)\n", - "Using cached langchain_huggingface-0.1.2-py3-none-any.whl (21 kB)\n", - "Downloading bcrypt-4.2.1-cp39-abi3-win_amd64.whl (153 kB)\n", - "Downloading build-1.2.2.post1-py3-none-any.whl (22 kB)\n", - "Downloading fastapi-0.115.8-py3-none-any.whl (94 kB)\n", - "Downloading grpcio-1.70.0-cp311-cp311-win_amd64.whl (4.3 MB)\n", - " ---------------------------------------- 0.0/4.3 MB ? eta -:--:--\n", - " ---------------------------------------- 4.3/4.3 MB 36.7 MB/s eta 0:00:00\n", - "Using cached httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)\n", - "Downloading kubernetes-32.0.1-py2.py3-none-any.whl (2.0 MB)\n", - " ---------------------------------------- 0.0/2.0 MB ? eta -:--:--\n", - " ---------------------------------------- 2.0/2.0 MB 27.8 MB/s eta 0:00:00\n", - "Downloading langchain_core-0.3.37-py3-none-any.whl (413 kB)\n", - "Downloading langchain_text_splitters-0.3.6-py3-none-any.whl (31 kB)\n", - "Downloading langsmith-0.3.10-py3-none-any.whl (333 kB)\n", - "Downloading mmh3-5.1.0-cp311-cp311-win_amd64.whl (41 kB)\n", - "Downloading onnxruntime-1.20.1-cp311-cp311-win_amd64.whl (11.3 MB)\n", - " ---------------------------------------- 0.0/11.3 MB ? eta -:--:--\n", - " ---------------------------------------- 11.3/11.3 MB 58.9 MB/s eta 0:00:00\n", - "Downloading opentelemetry_api-1.30.0-py3-none-any.whl (64 kB)\n", - "Downloading opentelemetry_exporter_otlp_proto_grpc-1.30.0-py3-none-any.whl (18 kB)\n", - "Downloading opentelemetry_exporter_otlp_proto_common-1.30.0-py3-none-any.whl (18 kB)\n", - "Downloading opentelemetry_proto-1.30.0-py3-none-any.whl (55 kB)\n", - "Downloading opentelemetry_instrumentation_fastapi-0.51b0-py3-none-any.whl (12 kB)\n", - "Downloading opentelemetry_instrumentation-0.51b0-py3-none-any.whl (30 kB)\n", - "Downloading opentelemetry_instrumentation_asgi-0.51b0-py3-none-any.whl (16 kB)\n", - "Downloading opentelemetry_semantic_conventions-0.51b0-py3-none-any.whl (177 kB)\n", - "Downloading opentelemetry_util_http-0.51b0-py3-none-any.whl (7.3 kB)\n", - "Downloading opentelemetry_sdk-1.30.0-py3-none-any.whl (118 kB)\n", - "Downloading orjson-3.10.15-cp311-cp311-win_amd64.whl (133 kB)\n", - "Downloading overrides-7.7.0-py3-none-any.whl (17 kB)\n", - "Downloading posthog-3.15.1-py2.py3-none-any.whl (74 kB)\n", - "Downloading pydantic_settings-2.8.0-py3-none-any.whl (30 kB)\n", - "Using cached tokenizers-0.21.0-cp39-abi3-win_amd64.whl (2.4 MB)\n", - "Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)\n", - " ---------------------------------------- 0.0/10.0 MB ? eta -:--:--\n", - " ---------------------------------------- 10.0/10.0 MB 56.8 MB/s eta 0:00:00\n", - "Downloading uvicorn-0.34.0-py3-none-any.whl (62 kB)\n", - "Downloading importlib_resources-6.5.2-py3-none-any.whl (37 kB)\n", - "Downloading backoff-2.2.1-py3-none-any.whl (15 kB)\n", - "Downloading durationpy-0.9-py3-none-any.whl (3.5 kB)\n", - "Downloading google_auth-2.38.0-py2.py3-none-any.whl (210 kB)\n", - "Downloading googleapis_common_protos-1.68.0-py2.py3-none-any.whl (164 kB)\n", - "Downloading httptools-0.6.4-cp311-cp311-win_amd64.whl (88 kB)\n", - "Downloading importlib_metadata-8.5.0-py3-none-any.whl (26 kB)\n", - "Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)\n", - "Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)\n", - "Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n", - "Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)\n", - "Downloading safetensors-0.5.2-cp38-abi3-win_amd64.whl (303 kB)\n", - "Downloading starlette-0.45.3-py3-none-any.whl (71 kB)\n", - "Downloading watchfiles-1.0.4-cp311-cp311-win_amd64.whl (284 kB)\n", - "Downloading websocket_client-1.8.0-py3-none-any.whl (58 kB)\n", - "Downloading websockets-15.0-cp311-cp311-win_amd64.whl (176 kB)\n", - "Downloading zstandard-0.23.0-cp311-cp311-win_amd64.whl (495 kB)\n", - "Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n", - "Downloading flatbuffers-25.2.10-py2.py3-none-any.whl (30 kB)\n", - "Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB)\n", - "Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)\n", - "Downloading asgiref-3.8.1-py3-none-any.whl (23 kB)\n", - "Downloading cachetools-5.5.2-py3-none-any.whl (10 kB)\n", - "Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n", - "Using cached jsonpointer-3.0.0-py2.py3-none-any.whl (7.6 kB)\n", - "Downloading pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)\n", - "Downloading rsa-4.9-py3-none-any.whl (34 kB)\n", - "Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB)\n", - "Downloading pyasn1-0.6.1-py3-none-any.whl (83 kB)\n", - "Downloading pyreadline3-3.5.4-py3-none-any.whl (83 kB)\n", - "Building wheels for collected packages: pypika\n", - " Building wheel for pypika (pyproject.toml): started\n", - " Building wheel for pypika (pyproject.toml): finished with status 'done'\n", - " Created wheel for pypika: filename=PyPika-0.48.9-py2.py3-none-any.whl size=53885 sha256=6b6a554b0bacfceec9064f993dcd113fa0b32b2d7109aac9ed8a54b26c75a85b\n", - " Stored in directory: c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local\\pip\\cache\\wheels\\a3\\01\\bd\\4c40ceb9d5354160cb186dcc153360f4ab7eb23e2b24daf96d\n", - "Successfully built pypika\n", - "Installing collected packages: pypika, flatbuffers, durationpy, zstandard, zipp, websockets, websocket-client, safetensors, python-dotenv, pyreadline3, pyproject_hooks, pyasn1, overrides, orjson, opentelemetry-util-http, opentelemetry-proto, oauthlib, mmh3, jsonpointer, importlib-resources, httpx-sse, httptools, grpcio, googleapis-common-protos, chroma-hnswlib, cachetools, bcrypt, backoff, asgiref, watchfiles, uvicorn, starlette, rsa, requests-toolbelt, requests-oauthlib, pyasn1-modules, posthog, opentelemetry-exporter-otlp-proto-common, jsonpatch, importlib-metadata, humanfriendly, huggingface-hub, build, tokenizers, pydantic-settings, opentelemetry-api, langsmith, google-auth, fastapi, coloredlogs, transformers, opentelemetry-semantic-conventions, onnxruntime, langchain-core, kubernetes, sentence_transformers, opentelemetry-sdk, opentelemetry-instrumentation, langchain-text-splitters, langchain_openai, opentelemetry-instrumentation-asgi, opentelemetry-exporter-otlp-proto-grpc, langchain_huggingface, langchain, opentelemetry-instrumentation-fastapi, langchain-community, chromadb\n", - "Successfully installed asgiref-3.8.1 backoff-2.2.1 bcrypt-4.2.1 build-1.2.2.post1 cachetools-5.5.2 chroma-hnswlib-0.7.6 chromadb-0.6.3 coloredlogs-15.0.1 durationpy-0.9 fastapi-0.115.8 flatbuffers-25.2.10 google-auth-2.38.0 googleapis-common-protos-1.68.0 grpcio-1.70.0 httptools-0.6.4 httpx-sse-0.4.0 huggingface-hub-0.29.1 humanfriendly-10.0 importlib-metadata-8.5.0 importlib-resources-6.5.2 jsonpatch-1.33 jsonpointer-3.0.0 kubernetes-32.0.1 langchain-0.3.19 langchain-community-0.3.18 langchain-core-0.3.37 langchain-text-splitters-0.3.6 langchain_huggingface-0.1.2 langchain_openai-0.3.6 langsmith-0.3.10 mmh3-5.1.0 oauthlib-3.2.2 onnxruntime-1.20.1 opentelemetry-api-1.30.0 opentelemetry-exporter-otlp-proto-common-1.30.0 opentelemetry-exporter-otlp-proto-grpc-1.30.0 opentelemetry-instrumentation-0.51b0 opentelemetry-instrumentation-asgi-0.51b0 opentelemetry-instrumentation-fastapi-0.51b0 opentelemetry-proto-1.30.0 opentelemetry-sdk-1.30.0 opentelemetry-semantic-conventions-0.51b0 opentelemetry-util-http-0.51b0 orjson-3.10.15 overrides-7.7.0 posthog-3.15.1 pyasn1-0.6.1 pyasn1-modules-0.4.1 pydantic-settings-2.8.0 pypika-0.48.9 pyproject_hooks-1.2.0 pyreadline3-3.5.4 python-dotenv-1.0.1 requests-oauthlib-2.0.0 requests-toolbelt-1.0.0 rsa-4.9 safetensors-0.5.2 sentence_transformers-3.4.1 starlette-0.45.3 tokenizers-0.21.0 transformers-4.49.0 uvicorn-0.34.0 watchfiles-1.0.4 websocket-client-1.8.0 websockets-15.0 zipp-3.21.0 zstandard-0.23.0\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n", - "[notice] A new release of pip is available: 24.3.1 -> 25.0.1\n", - "[notice] To update, run: C:\\Users\\baumanoa\\AppData\\Local\\Microsoft\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\python.exe -m pip install --upgrade pip\n" - ] - } - ], - "source": [ - "# Installation der benötigten Bibliotheken\n", - "\n", - "%pip install langchain langchain-community langchain_openai sentence_transformers huggingface-hub chromadb langchain_huggingface pandas" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Erklärung der verwendeten Bibliotheken:\n", - "- langchain & langchain-community: Framework für die Entwicklung von LLM-Anwendungen\n", - " - Bietet Werkzeuge für Dokumentenverarbeitung\n", - " - Ermöglicht strukturierte Arbeit mit verschiedenen LLMs\n", - " - Vereinfacht die RAG-Pipeline-Entwicklung\n", - "\n", - "- sentence_transformers: Bibliothek für Text-Embeddings\n", - " - Wandelt Text in numerische Vektoren um\n", - " - Unterstützt mehrsprachige Modelle\n", - " - Speziell für semantische Textähnlichkeit optimiert\n", - "\n", - "- huggingface-hub & langchain_huggingface: Zugriff auf KI-Modelle\n", - " - Bietet Zugang zu Open-Source-Modellen\n", - " - Ermöglicht lokale Nutzung von Embedding-Modellen\n", - " - Integration mit LangChain\n", - "\n", - "- chromadb: Vektordatenbank für Embedding-Speicherung\n", - " - Effiziente Speicherung und Suche von Vektoren\n", - " - Unterstützt Metadaten\n", - " - Ermöglicht Ähnlichkeitssuche\n", - "\n", - "- pandas: Bibliothek für Datenanalyse und -manipulation\n", - " - Effiziente Verarbeitung tabellarischer Daten\n", - " - Unterstützt komplexe Datenoperationen\n", - " - Vereinfacht Datenexploration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Erste Erkundung der Daten" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "### Daten laden und erkunden\n", - "\n", - "import pandas as pd\n", - "from pathlib import Path" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "<div>\n", - "<style scoped>\n", - " .dataframe tbody tr th:only-of-type {\n", - " vertical-align: middle;\n", - " }\n", - "\n", - " .dataframe tbody tr th {\n", - " vertical-align: top;\n", - " }\n", - "\n", - " .dataframe thead th {\n", - " text-align: right;\n", - " }\n", - "</style>\n", - "<table border=\"1\" class=\"dataframe\">\n", - " <thead>\n", - " <tr style=\"text-align: right;\">\n", - " <th></th>\n", - " <th>brief_nr</th>\n", - " <th>datum</th>\n", - " <th>rang</th>\n", - " <th>name</th>\n", - " <th>quelle</th>\n", - " <th>text</th>\n", - " </tr>\n", - " </thead>\n", - " <tbody>\n", - " <tr>\n", - " <th>0</th>\n", - " <td>1.0</td>\n", - " <td>19.06.1745</td>\n", - " <td>Unteroffizier</td>\n", - " <td>Nikolaus Binn</td>\n", - " <td>Liebe, Georg (1912, 4–6)</td>\n", - " <td>Gott zum gruß. Mit Wünschung ales Libes und Gu...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>1</th>\n", - " <td>2.0</td>\n", - " <td>21.09.1756</td>\n", - " <td>Mannschaften</td>\n", - " <td>Christian Arnholtz</td>\n", - " <td>Liebe, Georg (1912, 25–26)</td>\n", - " <td>Gott zum Gruß.\\nMein lieber bruder ich kan nic...</td>\n", - " </tr>\n", - " <tr>\n", - " <th>2</th>\n", - " <td>3.0</td>\n", - " <td>11.11.1756</td>\n", - " <td>Mannschaften</td>\n", - " <td>Kaspar Kalberlah</td>\n", - " <td>Liebe, Georg (1912, 28–29)</td>\n", - " <td>Gott zum Grus. Lieber Vater und Mutter, Brüder...</td>\n", - " </tr>\n", - " </tbody>\n", - "</table>\n", - "</div>" - ], - "text/plain": [ - " brief_nr datum rang name \\\n", - "0 1.0 19.06.1745 Unteroffizier Nikolaus Binn \n", - "1 2.0 21.09.1756 Mannschaften Christian Arnholtz \n", - "2 3.0 11.11.1756 Mannschaften Kaspar Kalberlah \n", - "\n", - " quelle \\\n", - "0 Liebe, Georg (1912, 4–6) \n", - "1 Liebe, Georg (1912, 25–26) \n", - "2 Liebe, Georg (1912, 28–29) \n", - "\n", - " text \n", - "0 Gott zum gruß. Mit Wünschung ales Libes und Gu... \n", - "1 Gott zum Gruß.\\nMein lieber bruder ich kan nic... \n", - "2 Gott zum Grus. Lieber Vater und Mutter, Brüder... " - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Laden der vorverarbeiteten Daten\n", - "\n", - "data_path = Path(\"../scripts/data/soldatenbriefe.csv\")\n", - "df = pd.read_csv(data_path)\n", - "\n", - "df.head(3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Die Daten sind bereits gut strukturiert und mit hilfreichen Metadaten erfasst. Wir bemerken aber das im Text noch " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import re\n", - "\n", - "def clean_text(text):\n", - " \"\"\"Bereinigt den Text von unerwünschten Formatierungen.\"\"\"\n", - " # Ersetze mehrfache Zeilenumbrüche durch einen einzelnen\n", - " text = re.sub(r'\\n+', '\\n', text)\n", - " \n", - " # Ersetze einzelne Zeilenumbrüche durch Leerzeichen\n", - " text = text.replace('\\n', ' ')\n", - " \n", - " # Normalisiere Whitespace\n", - " text = ' '.join(text.split())\n", - " \n", - " return text\n", - "\n", - "df['text'] = df['text'].apply(clean_text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Überblick über das Korpus:\n", - "Anzahl Briefe: 170\n", - "\n", - "Verteilung nach Dienstgrad:\n", - "rang\n", - "Mannschaften 95\n", - "Offizier 51\n", - "Unteroffizier 24\n", - "Name: count, dtype: int64\n", - "\n", - "Zeitliche Verteilung:\n", - "jahr\n", - "1745.0 1\n", - "1756.0 2\n", - "1757.0 3\n", - "1758.0 3\n", - "1759.0 1\n", - "1762.0 2\n", - "1776.0 1\n", - "1777.0 1\n", - "1794.0 1\n", - "1796.0 2\n", - "1798.0 1\n", - "1807.0 1\n", - "1809.0 2\n", - "1811.0 3\n", - "1812.0 5\n", - "1813.0 5\n", - "1814.0 1\n", - "1815.0 7\n", - "1816.0 1\n", - "1832.0 1\n", - "1833.0 1\n", - "1834.0 1\n", - "1848.0 4\n", - "1849.0 9\n", - "1850.0 1\n", - "1851.0 1\n", - "1861.0 1\n", - "1863.0 1\n", - "1864.0 7\n", - "1865.0 3\n", - "1866.0 30\n", - "1867.0 2\n", - "1868.0 1\n", - "1869.0 3\n", - "1870.0 31\n", - "1871.0 26\n", - "1872.0 1\n", - "Name: count, dtype: int64\n" - ] - } - ], - "source": [ - "# Grundlegende Statistiken\n", - "print(\"Überblick über das Korpus:\")\n", - "print(f\"Anzahl Briefe: {len(df)}\")\n", - "print(\"\\nVerteilung nach Dienstgrad:\")\n", - "print(df['rang'].value_counts())\n", - "\n", - "# Zeitliche Verteilung\n", - "df['jahr'] = pd.to_datetime(df['datum'], format='%d.%m.%Y').dt.year\n", - "print(\"\\nZeitliche Verteilung:\")\n", - "print(df['jahr'].value_counts().sort_index())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Gesamtanzahl Zeichen: 514,220\n", - "Gesamtanzahl Wörter: 87,250\n", - "Durchschnittliche Brieflänge: 3025 Zeichen\n", - "Durchschnittliche Wörter pro Brief: 513\n" - ] - } - ], - "source": [ - "def analyze_corpus_size(df):\n", - " \"\"\"Analysiert die Textmenge im Korpus.\"\"\"\n", - " total_chars = df['text'].str.len().sum()\n", - " total_words = df['text'].str.split().str.len().sum()\n", - " avg_chars_per_letter = df['text'].str.len().mean()\n", - " avg_words_per_letter = df['text'].str.split().str.len().mean()\n", - " \n", - " print(f\"Gesamtanzahl Zeichen: {total_chars:,}\")\n", - " print(f\"Gesamtanzahl Wörter: {total_words:,}\")\n", - " print(f\"Durchschnittliche Brieflänge: {avg_chars_per_letter:.0f} Zeichen\")\n", - " print(f\"Durchschnittliche Wörter pro Brief: {avg_words_per_letter:.0f}\")\n", - "\n", - "\n", - "analyze_corpus_size(df)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Was passiert hier?\n", - "\n", - "Wir laden die Daten mit pandas\n", - "Das CSV-Format ermöglicht einfachen Zugriff auf die Struktur\n", - "Wir sehen erste Muster in den Daten" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.3 Eine erste RAG-Pipeline\n", - "Hier demonstrieren wir die grundlegenden Schritte einer RAG-Pipeline:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", - "from langchain_community.vectorstores import Chroma\n", - "from langchain_huggingface import HuggingFaceEmbeddings\n", - "\n", - "# 1. Text in kleinere Stücke teilen (Chunking)\n", - "# Wir teilen die Briefe in überschaubare Abschnitte\n", - "splitter = RecursiveCharacterTextSplitter(\n", - " chunk_size=500, # Länge der Textabschnitte\n", - " chunk_overlap=50 # Überlappung zwischen Abschnitten\n", - ")\n", - "\n", - "# Beispiel mit einem Brief\n", - "beispiel_brief = df['text'].iloc[0]\n", - "chunks = splitter.split_text(beispiel_brief)\n", - "\n", - "print(f\"Ein Brief wurde in {len(chunks)} Abschnitte geteilt\")\n", - "print(\"\\nErster Abschnitt:\")\n", - "print(chunks[0])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "C:\\Users\\baumanoa\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n", - "C:\\Users\\baumanoa\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\huggingface_hub\\file_download.py:142: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\\Users\\baumanoa\\.cache\\huggingface\\hub\\models--sentence-transformers--paraphrase-multilingual-mpnet-base-v2. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.\n", - "To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development\n", - " warnings.warn(message)\n" - ] - } - ], - "source": [ - "# 2. Embedding-Modell initialisieren\n", - "embedding_model = HuggingFaceEmbeddings(\n", - " model_name=\"sentence-transformers/paraphrase-multilingual-mpnet-base-v2\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# 3. Vektordatenbank erstellen\n", - "def create_vectorstore(chunks, metadaten):\n", - " \"\"\"\n", - " Erstellt eine Vektordatenbank aus den Chunks und Metadaten.\n", - " \"\"\"\n", - " vectorstore = Chroma.from_texts(\n", - " texts=chunks,\n", - " embedding=embedding_model,\n", - " metadatas=metadaten,\n", - " persist_directory=\"./chroma_db\"\n", - " )\n", - " return vectorstore" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Pipeline ausführen\n", - "chunks, metadaten = prepare_chunks(df)\n", - "vectorstore = create_vectorstore(chunks, metadaten)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# 4. Beispielabfrage\n", - "query = \"Wie beschreiben Soldaten ihre Erlebnisse im Krieg?\"\n", - "results = vectorstore.similarity_search_with_relevance_scores(query, k=3)\n", - "\n", - "print(\"\\nBeispielabfrage:\")\n", - "print(f\"Frage: {query}\\n\")\n", - "print(\"Relevante Textstellen:\")\n", - "for doc, score in results:\n", - " print(f\"\\nRelevanz: {score:.4f}\")\n", - " print(f\"Brief Nr. {doc.metadata['brief_nr']} von {doc.metadata['name']} ({doc.metadata['rang']})\")\n", - " print(f\"Datum: {doc.metadata['datum']}\")\n", - " print(f\"Text: {doc.page_content[:200]}...\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git "a/rag_lb/jupyter-book/notebooks/00_Einf\303\274hrung_RAG.ipynb" "b/rag_lb/jupyter-book/notebooks/00_Einf\303\274hrung_RAG.ipynb" new file mode 100644 index 0000000..db1c23c --- /dev/null +++ "b/rag_lb/jupyter-book/notebooks/00_Einf\303\274hrung_RAG.ipynb" @@ -0,0 +1,1212 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Retrieval Augmented Generation (RAG) für historische Textanalyse\n", + "## Eine Einführung anhand deutscher Soldatenbriefe (1745-1872)\n", + "\n", + "### Einleitung und Überblick\n", + "\n", + "#### Was erwartet Sie in diesem Jupyter Book?\n", + "Dieses Jupyter Book führt Sie in die Methode des Retrieval Augmented Generation (RAG) ein und zeigt deren Anwendung in der historischen Textanalyse. RAG kombiniert die Fähigkeiten von Sprachmodellen mit gezielter Informationssuche in spezifischen Dokumenten - in unserem Fall historischen Quellen.\n", + "\n", + "#### Lernziele\n", + "Nach diesem einführenden Notebook werden Sie:\n", + "- Die Grundidee von RAG verstehen\n", + "- Den Wert dieser Methode für die historische Forschung einschätzen können\n", + "- Eine erste RAG-Pipeline in Aktion gesehen haben\n", + "- Die Möglichkeiten und Grenzen der Methode kennen\n", + "\n", + "#### Aufbau des Jupyter Books\n", + "Das Book ist modular aufgebaut und umfasst:\n", + "1. **Einführung** (dieses Notebook)\n", + " - Überblick und erste Demonstration\n", + "2. **Datenaufbereitung**\n", + " - Textverarbeitung\n", + " - Chunking-Strategien\n", + "3. **Embedding und Retrieval**\n", + " - Vektorisierung von Text\n", + " - Ähnlichkeitssuche\n", + "4. **LLM-Integration**\n", + " - Prompt-Entwicklung\n", + " - Antwortgenerierung\n", + "5. **Anwendungsfälle**\n", + " - Spezifische historische Analysen\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Unser Quellenkorpus: Soldatenbriefe 1745-1872\n", + "\n", + "#### Beschreibung der Quelle\n", + "Das Korpus \"Soldatenbriefe\" ist eine Sammlung von 170 Briefen aus dem deutschsprachigen Raum, die mehrere wichtige historische Epochen umfasst:\n", + "- Koalitions- und Befreiungskriege (1792−1815)\n", + "- Deutscher Krieg (1866)\n", + "- Deutsch-Französischer Krieg (1870/71)\n", + "\n", + "Besonders wertvoll für die historische Forschung sind:\n", + "- Die soziale Bandbreite (Offiziere bis einfache Soldaten)\n", + "- Die zeitliche Tiefe (über 125 Jahre)\n", + "- Die persönliche Perspektive auf historische Ereignisse\n", + "\n", + "#### Herkunft und Rechtliches\n", + "- Quelle: Deutsches Textarchiv (DTA)\n", + "- Lizenz: CC BY-SA 4.0\n", + "- GitHub: [deutschestextarchiv/soldatenbriefe](https://github.com/deutschestextarchiv/soldatenbriefe)\n", + "- Publikation: Marko Neumann (2019): \"Soldatenbriefe des 18. und 19. Jahrhunderts\"\n", + "\n", + "#### Datenformate und Struktur\n", + "Die Briefe liegen in verschiedenen Formaten vor:\n", + "- **TEI-XML**: Ursprungsformat mit detaillierten Metadaten\n", + "- **CSV**: Aufbereitete Version für die Analyse\n", + "- **Plain Text**: Extrahierte Brieftexte" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: pandas in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (2.2.3)Note: you may need to restart the kernel to use updated packages.\n", + "\n", + "Requirement already satisfied: numpy>=1.23.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (1.26.4)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2024.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pandas) (2024.2)\n", + "Requirement already satisfied: six>=1.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 24.3.1 -> 25.0.1\n", + "[notice] To update, run: C:\\Users\\baumanoa\\AppData\\Local\\Microsoft\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "# Installation der benötigten Bibliotheken\n", + "\n", + "%pip install pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Erste Erkundung der Daten" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "### Daten laden und erkunden\n", + "\n", + "import pandas as pd\n", + "from pathlib import Path" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>brief_nr</th>\n", + " <th>datum</th>\n", + " <th>rang</th>\n", + " <th>name</th>\n", + " <th>quelle</th>\n", + " <th>text</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>1.0</td>\n", + " <td>19.06.1745</td>\n", + " <td>Unteroffizier</td>\n", + " <td>Nikolaus Binn</td>\n", + " <td>Liebe, Georg (1912, 4–6)</td>\n", + " <td>Gott zum gruß. Mit Wünschung ales Libes und Gu...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>2.0</td>\n", + " <td>21.09.1756</td>\n", + " <td>Mannschaften</td>\n", + " <td>Christian Arnholtz</td>\n", + " <td>Liebe, Georg (1912, 25–26)</td>\n", + " <td>Gott zum Gruß.\\nMein lieber bruder ich kan nic...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>3.0</td>\n", + " <td>11.11.1756</td>\n", + " <td>Mannschaften</td>\n", + " <td>Kaspar Kalberlah</td>\n", + " <td>Liebe, Georg (1912, 28–29)</td>\n", + " <td>Gott zum Grus. Lieber Vater und Mutter, Brüder...</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " brief_nr datum rang name \\\n", + "0 1.0 19.06.1745 Unteroffizier Nikolaus Binn \n", + "1 2.0 21.09.1756 Mannschaften Christian Arnholtz \n", + "2 3.0 11.11.1756 Mannschaften Kaspar Kalberlah \n", + "\n", + " quelle \\\n", + "0 Liebe, Georg (1912, 4–6) \n", + "1 Liebe, Georg (1912, 25–26) \n", + "2 Liebe, Georg (1912, 28–29) \n", + "\n", + " text \n", + "0 Gott zum gruß. Mit Wünschung ales Libes und Gu... \n", + "1 Gott zum Gruß.\\nMein lieber bruder ich kan nic... \n", + "2 Gott zum Grus. Lieber Vater und Mutter, Brüder... " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Laden der vorverarbeiteten Daten\n", + "\n", + "data_path = Path(\"../scripts/data/soldatenbriefe.csv\")\n", + "df = pd.read_csv(data_path)\n", + "\n", + "df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Die Daten sind bereits gut strukturiert und mit hilfreichen Metadaten erfasst. Wir bemerken aber das im Text noch Zeilenumbrüche durch \"\\n\" referenziert werden. Diese sollten wir mit einem einfachen Leerzeichen ersetzen." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "def clean_text(text):\n", + " \"\"\"Bereinigt den Text von unerwünschten Formatierungen.\"\"\"\n", + " # Ersetze mehrfache Zeilenumbrüche durch einen einzelnen\n", + " text = re.sub(r'\\n+', '\\n', text)\n", + " \n", + " # Ersetze einzelne Zeilenumbrüche durch Leerzeichen\n", + " text = text.replace('\\n', ' ')\n", + " \n", + " # Normalisiere Whitespace\n", + " text = ' '.join(text.split())\n", + " \n", + " return text\n", + "\n", + "df['text'] = df['text'].apply(clean_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Nun können wir unseren Korpus quantitativ untersuchen. Wir sollten vor einer Analyse folgende Details zum Korpus wissen:\n", + "- Wie viele Dokumente haben wir?\n", + "- Wie lang sind die Dokumente?\n", + "- Welche Metadaten haben in für unsere Texte?" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Überblick über das Korpus:\n", + "Anzahl Briefe: 170\n", + "\n", + "Verteilung nach Dienstgrad:\n", + "rang\n", + "Mannschaften 95\n", + "Offizier 51\n", + "Unteroffizier 24\n", + "Name: count, dtype: int64\n", + "\n", + "Zeitliche Verteilung:\n", + "jahr\n", + "1745.0 1\n", + "1756.0 2\n", + "1757.0 3\n", + "1758.0 3\n", + "1759.0 1\n", + "1762.0 2\n", + "1776.0 1\n", + "1777.0 1\n", + "1794.0 1\n", + "1796.0 2\n", + "1798.0 1\n", + "1807.0 1\n", + "1809.0 2\n", + "1811.0 3\n", + "1812.0 5\n", + "1813.0 5\n", + "1814.0 1\n", + "1815.0 7\n", + "1816.0 1\n", + "1832.0 1\n", + "1833.0 1\n", + "1834.0 1\n", + "1848.0 4\n", + "1849.0 9\n", + "1850.0 1\n", + "1851.0 1\n", + "1861.0 1\n", + "1863.0 1\n", + "1864.0 7\n", + "1865.0 3\n", + "1866.0 30\n", + "1867.0 2\n", + "1868.0 1\n", + "1869.0 3\n", + "1870.0 31\n", + "1871.0 26\n", + "1872.0 1\n", + "Name: count, dtype: int64\n" + ] + } + ], + "source": [ + "# Grundlegende Statistiken\n", + "print(\"Überblick über das Korpus:\")\n", + "print(f\"Anzahl Briefe: {len(df)}\")\n", + "print(\"\\nVerteilung nach Dienstgrad:\")\n", + "print(df['rang'].value_counts())\n", + "\n", + "# Zeitliche Verteilung\n", + "df['jahr'] = pd.to_datetime(df['datum'], format='%d.%m.%Y').dt.year\n", + "print(\"\\nZeitliche Verteilung:\")\n", + "print(df['jahr'].value_counts().sort_index())" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Gesamtanzahl Zeichen: 514,220\n", + "Gesamtanzahl Wörter: 87,250\n", + "Durchschnittliche Brieflänge: 3025 Zeichen\n", + "Durchschnittliche Wörter pro Brief: 513\n", + "Länge des kürzesten Briefs: 378 Zeichen\n", + "Länge des längsten Briefs: 13641 Zeichen\n" + ] + } + ], + "source": [ + "def analyze_corpus_size(df):\n", + " \"\"\"Analysiert die Textmenge im Korpus.\"\"\"\n", + " total_chars = df['text'].str.len().sum()\n", + " total_words = df['text'].str.split().str.len().sum()\n", + " avg_chars_per_letter = df['text'].str.len().mean()\n", + " avg_words_per_letter = df['text'].str.split().str.len().mean()\n", + " min_chars = df['text'].str.len().min()\n", + " max_chars = df['text'].str.len().max()\n", + " \n", + " print(f\"Gesamtanzahl Zeichen: {total_chars:,}\")\n", + " print(f\"Gesamtanzahl Wörter: {total_words:,}\")\n", + " print(f\"Durchschnittliche Brieflänge: {avg_chars_per_letter:.0f} Zeichen\")\n", + " print(f\"Durchschnittliche Wörter pro Brief: {avg_words_per_letter:.0f}\")\n", + " print(f\"Länge des kürzesten Briefs: {min_chars} Zeichen\")\n", + " print(f\"Länge des längsten Briefs: {max_chars} Zeichen\")\n", + "\n", + "\n", + "analyze_corpus_size(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Nun haben wir eine erste quantitative Analyse unseres Datensatzes und können uns überlegen arum RAG für historische Textanalyse nützlich sein könnte gegenüber einer analogen Analyse und anderen Methoden wie zum Beispiel die welche durch [Voyant Tools](https://voyant-tools.org/) zur verfügung stehen?\n", + "\n", + "### Die Herausforderung historischer Textanalyse\n", + "\n", + "Historische Textanalyse steht vor besonderen Herausforderungen:\n", + "- Große Textmengen müssen systematisch erschlossen werden\n", + "- Kontextuelle Zusammenhänge müssen bewahrt bleiben\n", + "- Sprachliche und historische Besonderheiten müssen berücksichtigt werden\n", + "\n", + "Unser Korpus umfasst:\n", + "- 170 Briefe\n", + "- 514.220 Zeichen\n", + "- 87.250 Wörter\n", + "- Durchschnittlich 3.025 Zeichen pro Brief\n", + "- Durchschnittlich 513 Wörter pro Brief\n", + "\n", + "Diese Menge mag auf den ersten Blick überschaubar erscheinen, stellt aber für manuelle Analyse bereits eine Herausforderung dar:\n", + "- Ein Mensch bräuchte ca. 7-8 Stunden, um alle Briefe einmal aufmerksam zu lesen\n", + "- Die gleichzeitige Berücksichtigung aller Briefe für eine Analyse ist kaum möglich\n", + "- Das Auffinden spezifischer Themen oder Muster über alle Briefe hinweg ist zeitaufwändig\n", + "\n", + "### Warum RAG die Lösung ist\n", + "\n", + "RAG (Retrieval Augmented Generation) bietet hier mehrere Vorteile:\n", + "\n", + "1. **Effizientes Retrieval**\n", + " - Automatische Identifikation relevanter Textstellen\n", + " - Berücksichtigung semantischer Ähnlichkeiten\n", + " - Schnelle Durchsuchung des gesamten Korpus\n", + "\n", + "2. **Kontextbewusstsein**\n", + " - Die verstellbare Chunk-Größe bewahrt den lokalen Kontext\n", + " - Überlappungen zwischen Chunks erhalten Zusammenhänge\n", + " - Metadaten (Datum, Rang, etc.) bleiben erhalten\n", + "\n", + "3. **Skalierbarkeit**\n", + " - Die Methode funktioniert auch bei wachsendem Korpus\n", + " - Neue Briefe können einfach integriert werden\n", + " - Die Analyse bleibt konsistent\n", + "\n", + "4. **Qualitative Analyse**\n", + " - LLMs können sprachliche Feinheiten erkennen\n", + " - Historische Kontexte können berücksichtigt werden\n", + " - Komplexe Zusammenhänge werden erkannt\n", + "\n", + "### Warum unser Korpus ideal für RAG ist\n", + "\n", + "Die Größe unseres Korpus ist aus mehreren Gründen ideal für RAG:\n", + "\n", + "1. **Chunking-Perspektive**\n", + " - Bei durchschnittlich 3.025 Zeichen pro Brief\n", + " - Und einer Chunk-Größe von 500 Zeichen\n", + " - Erhalten wir ca. 1.000-1.200 Chunks\n", + " - Dies ist optimal für präzises Retrieval\n", + "\n", + "2. **Kontextuelle Tiefe**\n", + " - Jeder Brief ist lang genug für bedeutungsvolle Analyse\n", + " - Kurz genug für präzise Chunk-Bildung\n", + " - Genug Material für Vergleiche und Muster\n", + "\n", + "3. **Metadaten-Struktur**\n", + " - Reiche Kontextinformationen (Rang, Datum, etc.)\n", + " - Ermöglicht vielfältige Analyseperspektiven\n", + " - Unterstützt historische Einordnung\n", + "\n", + "### Praktische Bedeutung\n", + "\n", + "RAG ermöglicht:\n", + "- Schnelles Auffinden thematisch relevanter Passagen\n", + "- Systematische Analyse über das gesamte Korpus\n", + "- Entdeckung von Mustern und Zusammenhängen\n", + "- Kombination quantitativer und qualitativer Analyse\n", + "\n", + "Diese Vorteile machen RAG zu einem wertvollen Werkzeug für die historische Forschung, besonders bei der Analyse von Korrespondenzen und persönlichen Dokumenten." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Anforderungen an ein Korpus für RAG\n", + "- wann lohnt es sich, wann lohnt es sich nicht. \n", + "- Welche Formate funktionieren?\n", + "- Was muss man sich überlegen.\n", + "\n", + "### Wie müssen die Daten aufbereitet werden?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Erste Schritte mit RAG\n", + "\n", + "#### Technische Voraussetzungen\n", + "Für dieses Notebook benötigen wir einige Python-Bibliotheken:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: langchain in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.19)\n", + "Requirement already satisfied: langchain-community in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.18)\n", + "Requirement already satisfied: langchain_openai in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.6)\n", + "Requirement already satisfied: sentence_transformers in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (3.4.1)\n", + "Requirement already satisfied: huggingface-hub in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.29.1)\n", + "Requirement already satisfied: chromadb in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.6.3)\n", + "Requirement already satisfied: langchain_huggingface in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.1.2)\n", + "Requirement already satisfied: langchain-core<1.0.0,>=0.3.35 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.37)\n", + "Requirement already satisfied: langchain-text-splitters<1.0.0,>=0.3.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.6)\n", + "Requirement already satisfied: langsmith<0.4,>=0.1.17 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.10)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.10.5)\n", + "Requirement already satisfied: SQLAlchemy<3,>=1.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.0.37)\n", + "Requirement already satisfied: requests<3,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.32.3)\n", + "Requirement already satisfied: PyYAML>=5.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (6.0.2)\n", + "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (3.11.11)\n", + "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (9.0.0)\n", + "Requirement already satisfied: numpy<2,>=1.26.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (1.26.4)\n", + "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (0.6.7)\n", + "Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (2.8.0)\n", + "Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (0.4.0)\n", + "Requirement already satisfied: openai<2.0.0,>=1.58.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (1.60.0)\n", + "Requirement already satisfied: tiktoken<1,>=0.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (0.8.0)\n", + "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (4.49.0)\n", + "Requirement already satisfied: tqdm in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (4.67.1)\n", + "Requirement already satisfied: torch>=1.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (2.5.1)\n", + "Requirement already satisfied: scikit-learn in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.6.1)\n", + "Requirement already satisfied: scipy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.12.0)\n", + "Requirement already satisfied: Pillow in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (11.0.0)\n", + "Requirement already satisfied: filelock in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (3.16.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (2024.12.0)\n", + "Requirement already satisfied: packaging>=20.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (24.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (4.12.2)\n", + "Requirement already satisfied: build>=1.0.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.2.2.post1)\n", + "Requirement already satisfied: chroma-hnswlib==0.7.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.7.6)\n", + "Requirement already satisfied: fastapi>=0.95.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.115.8)\n", + "Requirement already satisfied: uvicorn>=0.18.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.34.0)\n", + "Requirement already satisfied: posthog>=2.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (3.15.1)\n", + "Requirement already satisfied: onnxruntime>=1.14.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.20.1)\n", + "Requirement already satisfied: opentelemetry-api>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-exporter-otlp-proto-grpc>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-instrumentation-fastapi>=0.41b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-sdk>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: tokenizers>=0.13.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.21.0)\n", + "Requirement already satisfied: pypika>=0.48.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.48.9)\n", + "Requirement already satisfied: overrides>=7.3.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (7.7.0)\n", + "Requirement already satisfied: importlib-resources in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (6.5.2)\n", + "Requirement already satisfied: grpcio>=1.58.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.70.0)\n", + "Requirement already satisfied: bcrypt>=4.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (4.2.1)\n", + "Requirement already satisfied: typer>=0.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.15.1)\n", + "Requirement already satisfied: kubernetes>=28.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (32.0.1)\n", + "Requirement already satisfied: mmh3>=4.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (5.1.0)\n", + "Requirement already satisfied: orjson>=3.9.12 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (3.10.15)\n", + "Requirement already satisfied: httpx>=0.27.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.28.1)\n", + "Requirement already satisfied: rich>=10.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (13.9.4)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.4.4)\n", + "Requirement already satisfied: aiosignal>=1.1.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.2)\n", + "Requirement already satisfied: attrs>=17.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (24.3.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.5.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.1.0)\n", + "Requirement already satisfied: propcache>=0.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (0.2.1)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.18.3)\n", + "Requirement already satisfied: pyproject_hooks in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from build>=1.0.3->chromadb) (1.2.0)\n", + "Requirement already satisfied: colorama in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from build>=1.0.3->chromadb) (0.4.6)\n", + "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.0)\n", + "Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n", + "Requirement already satisfied: starlette<0.46.0,>=0.40.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from fastapi>=0.95.2->chromadb) (0.45.3)\n", + "Requirement already satisfied: anyio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (4.8.0)\n", + "Requirement already satisfied: certifi in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (2024.12.14)\n", + "Requirement already satisfied: httpcore==1.* in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (1.0.7)\n", + "Requirement already satisfied: idna in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (3.10)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpcore==1.*->httpx>=0.27.0->chromadb) (0.14.0)\n", + "Requirement already satisfied: six>=1.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (1.17.0)\n", + "Requirement already satisfied: python-dateutil>=2.5.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.9.0.post0)\n", + "Requirement already satisfied: google-auth>=1.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.38.0)\n", + "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (1.8.0)\n", + "Requirement already satisfied: requests-oauthlib in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.0.0)\n", + "Requirement already satisfied: oauthlib>=3.2.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (3.2.2)\n", + "Requirement already satisfied: urllib3>=1.24.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.2.3)\n", + "Requirement already satisfied: durationpy>=0.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (0.9)\n", + "Requirement already satisfied: jsonpatch<2.0,>=1.33 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-core<1.0.0,>=0.3.35->langchain) (1.33)\n", + "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langsmith<0.4,>=0.1.17->langchain) (1.0.0)\n", + "Requirement already satisfied: zstandard<0.24.0,>=0.23.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langsmith<0.4,>=0.1.17->langchain) (0.23.0)\n", + "Requirement already satisfied: coloredlogs in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (15.0.1)\n", + "Requirement already satisfied: flatbuffers in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (25.2.10)\n", + "Requirement already satisfied: protobuf in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (5.29.3)\n", + "Requirement already satisfied: sympy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (1.13.1)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.9.0)\n", + "Requirement already satisfied: jiter<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (0.8.2)\n", + "Requirement already satisfied: sniffio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.3.1)\n", + "Requirement already satisfied: deprecated>=1.2.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-api>=1.2.0->chromadb) (1.2.16)\n", + "Requirement already satisfied: importlib-metadata<=8.5.0,>=6.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-api>=1.2.0->chromadb) (8.5.0)\n", + "Requirement already satisfied: googleapis-common-protos~=1.52 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.68.0)\n", + "Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.30.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-proto==1.30.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-instrumentation-asgi==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-instrumentation==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-semantic-conventions==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-util-http==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (1.17.2)\n", + "Requirement already satisfied: asgiref~=3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-asgi==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (3.8.1)\n", + "Requirement already satisfied: monotonic>=1.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from posthog>=2.4.0->chromadb) (1.6)\n", + "Requirement already satisfied: backoff>=1.10.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from posthog>=2.4.0->chromadb) (2.2.1)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.27.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.27.2)\n", + "Requirement already satisfied: python-dotenv>=0.21.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community) (1.0.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from requests<3,>=2->langchain) (3.4.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (3.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (2.18.0)\n", + "Requirement already satisfied: greenlet!=0.4.17 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from SQLAlchemy<3,>=1.4->langchain) (3.1.1)\n", + "Requirement already satisfied: regex>=2022.1.18 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from tiktoken<1,>=0.7->langchain_openai) (2024.11.6)\n", + "Requirement already satisfied: networkx in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.4.2)\n", + "Requirement already satisfied: jinja2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.1.5)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n", + "Requirement already satisfied: safetensors>=0.4.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from transformers<5.0.0,>=4.41.0->sentence_transformers) (0.5.2)\n", + "Requirement already satisfied: click>=8.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (8.1.8)\n", + "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (1.5.4)\n", + "Requirement already satisfied: httptools>=0.6.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.6.4)\n", + "Requirement already satisfied: watchfiles>=0.13 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (1.0.4)\n", + "Requirement already satisfied: websockets>=10.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (15.0)\n", + "Requirement already satisfied: joblib>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (1.4.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (3.5.0)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (5.5.2)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.4.1)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (4.9)\n", + "Requirement already satisfied: zipp>=3.20 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from importlib-metadata<=8.5.0,>=6.0->opentelemetry-api>=1.2.0->chromadb) (3.21.0)\n", + "Requirement already satisfied: jsonpointer>=1.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.35->langchain) (3.0.0)\n", + "Requirement already satisfied: mdurl~=0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb) (0.1.2)\n", + "Requirement already satisfied: mypy-extensions>=0.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.0.0)\n", + "Requirement already satisfied: humanfriendly>=9.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb) (10.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from jinja2->torch>=1.11.0->sentence_transformers) (3.0.2)\n", + "Requirement already satisfied: pyreadline3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from humanfriendly>=9.1->coloredlogs->onnxruntime>=1.14.1->chromadb) (3.5.4)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.6.1)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 24.3.1 -> 25.0.1\n", + "[notice] To update, run: C:\\Users\\baumanoa\\AppData\\Local\\Microsoft\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "%pip install langchain langchain-community langchain_openai sentence_transformers huggingface-hub chromadb langchain_huggingface" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Erklärung der verwendeten Bibliotheken:\n", + "- langchain & langchain-community: Framework für die Entwicklung von LLM-Anwendungen\n", + " - Bietet Werkzeuge für Dokumentenverarbeitung\n", + " - Ermöglicht strukturierte Arbeit mit verschiedenen LLMs\n", + " - Vereinfacht die RAG-Pipeline-Entwicklung\n", + "\n", + "- sentence_transformers: Bibliothek für Text-Embeddings\n", + " - Wandelt Text in numerische Vektoren um\n", + " - Unterstützt mehrsprachige Modelle\n", + " - Speziell für semantische Textähnlichkeit optimiert\n", + "\n", + "- huggingface-hub & langchain_huggingface: Zugriff auf KI-Modelle\n", + " - Bietet Zugang zu Open-Source-Modellen\n", + " - Ermöglicht lokale Nutzung von Embedding-Modellen\n", + " - Integration mit LangChain\n", + "\n", + "- chromadb: Vektordatenbank für Embedding-Speicherung\n", + " - Effiziente Speicherung und Suche von Vektoren\n", + " - Unterstützt Metadaten\n", + " - Ermöglicht Ähnlichkeitssuche" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Eine erste RAG-Pipeline\n", + "Hier demonstrieren wir die grundlegenden Schritte einer RAG-Pipeline:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: langchain in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.19)\n", + "Requirement already satisfied: langchain-community in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.18)\n", + "Requirement already satisfied: langchain_openai in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.3.6)\n", + "Requirement already satisfied: sentence_transformers in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (3.4.1)\n", + "Requirement already satisfied: huggingface-hub in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.29.1)\n", + "Requirement already satisfied: chromadb in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.6.3)\n", + "Requirement already satisfied: langchain_huggingface in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (0.1.2)\n", + "Requirement already satisfied: langchain-core<1.0.0,>=0.3.35 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.37)\n", + "Requirement already satisfied: langchain-text-splitters<1.0.0,>=0.3.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.6)\n", + "Requirement already satisfied: langsmith<0.4,>=0.1.17 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (0.3.10)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.10.5)\n", + "Requirement already satisfied: SQLAlchemy<3,>=1.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.0.37)\n", + "Requirement already satisfied: requests<3,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (2.32.3)\n", + "Requirement already satisfied: PyYAML>=5.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (6.0.2)\n", + "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (3.11.11)\n", + "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (9.0.0)\n", + "Requirement already satisfied: numpy<2,>=1.26.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain) (1.26.4)\n", + "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (0.6.7)\n", + "Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (2.8.0)\n", + "Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-community) (0.4.0)\n", + "Requirement already satisfied: openai<2.0.0,>=1.58.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (1.60.0)\n", + "Requirement already satisfied: tiktoken<1,>=0.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain_openai) (0.8.0)\n", + "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (4.49.0)\n", + "Requirement already satisfied: tqdm in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (4.67.1)\n", + "Requirement already satisfied: torch>=1.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (2.5.1)\n", + "Requirement already satisfied: scikit-learn in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.6.1)\n", + "Requirement already satisfied: scipy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (1.12.0)\n", + "Requirement already satisfied: Pillow in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sentence_transformers) (11.0.0)\n", + "Requirement already satisfied: filelock in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (3.16.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (2024.12.0)\n", + "Requirement already satisfied: packaging>=20.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (24.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from huggingface-hub) (4.12.2)\n", + "Requirement already satisfied: build>=1.0.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.2.2.post1)\n", + "Requirement already satisfied: chroma-hnswlib==0.7.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.7.6)\n", + "Requirement already satisfied: fastapi>=0.95.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.115.8)\n", + "Requirement already satisfied: uvicorn>=0.18.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.34.0)\n", + "Requirement already satisfied: posthog>=2.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (3.15.1)\n", + "Requirement already satisfied: onnxruntime>=1.14.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.20.1)\n", + "Requirement already satisfied: opentelemetry-api>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-exporter-otlp-proto-grpc>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-instrumentation-fastapi>=0.41b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-sdk>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.30.0)\n", + "Requirement already satisfied: tokenizers>=0.13.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.21.0)\n", + "Requirement already satisfied: pypika>=0.48.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.48.9)\n", + "Requirement already satisfied: overrides>=7.3.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (7.7.0)\n", + "Requirement already satisfied: importlib-resources in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (6.5.2)\n", + "Requirement already satisfied: grpcio>=1.58.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (1.70.0)\n", + "Requirement already satisfied: bcrypt>=4.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (4.2.1)\n", + "Requirement already satisfied: typer>=0.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.15.1)\n", + "Requirement already satisfied: kubernetes>=28.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (32.0.1)\n", + "Requirement already satisfied: mmh3>=4.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (5.1.0)\n", + "Requirement already satisfied: orjson>=3.9.12 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (3.10.15)\n", + "Requirement already satisfied: httpx>=0.27.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (0.28.1)\n", + "Requirement already satisfied: rich>=10.11.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from chromadb) (13.9.4)\n", + "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.4.4)\n", + "Requirement already satisfied: aiosignal>=1.1.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.2)\n", + "Requirement already satisfied: attrs>=17.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (24.3.0)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.5.0)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.1.0)\n", + "Requirement already satisfied: propcache>=0.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (0.2.1)\n", + "Requirement already satisfied: yarl<2.0,>=1.17.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.18.3)\n", + "Requirement already satisfied: pyproject_hooks in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from build>=1.0.3->chromadb) (1.2.0)\n", + "Requirement already satisfied: colorama in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from build>=1.0.3->chromadb) (0.4.6)\n", + "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.0)\n", + "Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n", + "Requirement already satisfied: starlette<0.46.0,>=0.40.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from fastapi>=0.95.2->chromadb) (0.45.3)\n", + "Requirement already satisfied: anyio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (4.8.0)\n", + "Requirement already satisfied: certifi in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (2024.12.14)\n", + "Requirement already satisfied: httpcore==1.* in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (1.0.7)\n", + "Requirement already satisfied: idna in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpx>=0.27.0->chromadb) (3.10)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from httpcore==1.*->httpx>=0.27.0->chromadb) (0.14.0)\n", + "Requirement already satisfied: six>=1.9.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (1.17.0)\n", + "Requirement already satisfied: python-dateutil>=2.5.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.9.0.post0)\n", + "Requirement already satisfied: google-auth>=1.0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.38.0)\n", + "Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (1.8.0)\n", + "Requirement already satisfied: requests-oauthlib in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.0.0)\n", + "Requirement already satisfied: oauthlib>=3.2.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (3.2.2)\n", + "Requirement already satisfied: urllib3>=1.24.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (2.2.3)\n", + "Requirement already satisfied: durationpy>=0.7 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from kubernetes>=28.1.0->chromadb) (0.9)\n", + "Requirement already satisfied: jsonpatch<2.0,>=1.33 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langchain-core<1.0.0,>=0.3.35->langchain) (1.33)\n", + "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langsmith<0.4,>=0.1.17->langchain) (1.0.0)\n", + "Requirement already satisfied: zstandard<0.24.0,>=0.23.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from langsmith<0.4,>=0.1.17->langchain) (0.23.0)\n", + "Requirement already satisfied: coloredlogs in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (15.0.1)\n", + "Requirement already satisfied: flatbuffers in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (25.2.10)\n", + "Requirement already satisfied: protobuf in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (5.29.3)\n", + "Requirement already satisfied: sympy in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from onnxruntime>=1.14.1->chromadb) (1.13.1)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.9.0)\n", + "Requirement already satisfied: jiter<1,>=0.4.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (0.8.2)\n", + "Requirement already satisfied: sniffio in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from openai<2.0.0,>=1.58.1->langchain_openai) (1.3.1)\n", + "Requirement already satisfied: deprecated>=1.2.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-api>=1.2.0->chromadb) (1.2.16)\n", + "Requirement already satisfied: importlib-metadata<=8.5.0,>=6.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-api>=1.2.0->chromadb) (8.5.0)\n", + "Requirement already satisfied: googleapis-common-protos~=1.52 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.68.0)\n", + "Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.30.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-proto==1.30.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb) (1.30.0)\n", + "Requirement already satisfied: opentelemetry-instrumentation-asgi==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-instrumentation==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-semantic-conventions==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: opentelemetry-util-http==0.51b0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (0.51b0)\n", + "Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (1.17.2)\n", + "Requirement already satisfied: asgiref~=3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from opentelemetry-instrumentation-asgi==0.51b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb) (3.8.1)\n", + "Requirement already satisfied: monotonic>=1.5 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from posthog>=2.4.0->chromadb) (1.6)\n", + "Requirement already satisfied: backoff>=1.10.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from posthog>=2.4.0->chromadb) (2.2.1)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.27.2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.27.2)\n", + "Requirement already satisfied: python-dotenv>=0.21.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community) (1.0.1)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from requests<3,>=2->langchain) (3.4.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (3.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from rich>=10.11.0->chromadb) (2.18.0)\n", + "Requirement already satisfied: greenlet!=0.4.17 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from SQLAlchemy<3,>=1.4->langchain) (3.1.1)\n", + "Requirement already satisfied: regex>=2022.1.18 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from tiktoken<1,>=0.7->langchain_openai) (2024.11.6)\n", + "Requirement already satisfied: networkx in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.4.2)\n", + "Requirement already satisfied: jinja2 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from torch>=1.11.0->sentence_transformers) (3.1.5)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.3.0)\n", + "Requirement already satisfied: safetensors>=0.4.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from transformers<5.0.0,>=4.41.0->sentence_transformers) (0.5.2)\n", + "Requirement already satisfied: click>=8.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (8.1.8)\n", + "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typer>=0.9.0->chromadb) (1.5.4)\n", + "Requirement already satisfied: httptools>=0.6.3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (0.6.4)\n", + "Requirement already satisfied: watchfiles>=0.13 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (1.0.4)\n", + "Requirement already satisfied: websockets>=10.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (15.0)\n", + "Requirement already satisfied: joblib>=1.2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (1.4.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from scikit-learn->sentence_transformers) (3.5.0)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (5.5.2)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.4.1)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (4.9)\n", + "Requirement already satisfied: zipp>=3.20 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from importlib-metadata<=8.5.0,>=6.0->opentelemetry-api>=1.2.0->chromadb) (3.21.0)\n", + "Requirement already satisfied: jsonpointer>=1.9 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.35->langchain) (3.0.0)\n", + "Requirement already satisfied: mdurl~=0.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb) (0.1.2)\n", + "Requirement already satisfied: mypy-extensions>=0.3.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.0.0)\n", + "Requirement already satisfied: humanfriendly>=9.1 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb) (10.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from jinja2->torch>=1.11.0->sentence_transformers) (3.0.2)\n", + "Requirement already satisfied: pyreadline3 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from humanfriendly>=9.1->coloredlogs->onnxruntime>=1.14.1->chromadb) (3.5.4)\n", + "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in c:\\users\\baumanoa\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb) (0.6.1)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 24.3.1 -> 25.0.1\n", + "[notice] To update, run: C:\\Users\\baumanoa\\AppData\\Local\\Microsoft\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "# Installation der benötigten Bibliotheken\n", + "%pip install langchain langchain-community langchain_openai sentence_transformers huggingface-hub chromadb langchain_huggingface" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Erklärung der verwendeten Bibliotheken:\n", + "- langchain & langchain-community: Framework für die Entwicklung von LLM-Anwendungen\n", + " - Bietet Werkzeuge für Dokumentenverarbeitung\n", + " - Ermöglicht strukturierte Arbeit mit verschiedenen LLMs\n", + " - Vereinfacht die RAG-Pipeline-Entwicklung\n", + "\n", + "- sentence_transformers: Bibliothek für Text-Embeddings\n", + " - Wandelt Text in numerische Vektoren um\n", + " - Unterstützt mehrsprachige Modelle\n", + " - Speziell für semantische Textähnlichkeit optimiert\n", + "\n", + "- huggingface-hub & langchain_huggingface: Zugriff auf KI-Modelle\n", + " - Bietet Zugang zu Open-Source-Modellen\n", + " - Ermöglicht lokale Nutzung von Embedding-Modellen\n", + " - Integration mit LangChain\n", + "\n", + "- chromadb: Vektordatenbank für Embedding-Speicherung\n", + " - Effiziente Speicherung und Suche von Vektoren\n", + " - Unterstützt Metadaten\n", + " - Ermöglicht Ähnlichkeitssuche" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ein Brief wurde in 13 Abschnitte geteilt\n", + "\n", + "Erster Abschnitt:\n", + "Gott zum gruß. Mit Wünschung ales Libes und Gutes. Hertz Liebe Frau, wen ich Dir mit meinen wenigen schreiben noch möchte bey guter Gesundheit andreffen so sol es mir von hertzen lib sein benebst auch Meine Liben Eltern und schwiger Eltern, auch meine Libe schwester und Bruder auch schwegers und ale\n" + ] + } + ], + "source": [ + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "from langchain_community.vectorstores import Chroma\n", + "from langchain_huggingface import HuggingFaceEmbeddings\n", + "\n", + "# 1. Text in kleinere Stücke teilen (Chunking)\n", + "# Wir teilen die Briefe in überschaubare Abschnitte\n", + "splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=300, # Länge der Textabschnitte\n", + " chunk_overlap=50 # Überlappung zwischen Abschnitten\n", + ")\n", + "\n", + "# Beispiel mit einem Brief\n", + "beispiel_brief = df['text'].iloc[0]\n", + "chunks = splitter.split_text(beispiel_brief)\n", + "\n", + "print(f\"Ein Brief wurde in {len(chunks)} Abschnitte geteilt\")\n", + "print(\"\\nErster Abschnitt:\")\n", + "print(chunks[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Was passiert hier?\n", + "\n", + "- Der Text wird in kleinere Einheiten aufgeteilt\n", + "- Die Größe (500 Zeichen) ist ein wichtiger Parameter\n", + "- Überlappung verhindert das \"Zerschneiden\" von Zusammenhängen" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "C:\\Users\\baumanoa\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "# 2. Text in Vektoren umwandeln (Embedding)\n", + "embeddings = HuggingFaceEmbeddings(\n", + " model_name=\"sentence-transformers/paraphrase-multilingual-mpnet-base-v2\"\n", + ")\n", + "\n", + "# 3. Vektordatenbank erstellen\n", + "vectorstore = Chroma.from_texts(\n", + " texts=chunks,\n", + " embedding=embeddings,\n", + " persist_directory=\"./chroma_db\"\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Ergebnis 1:\n", + "getreuer Man biß in den Tott. Hertz Libe frau ich bericht dir auch daß ich deinen an mir geschribnen brif von 1. Mey hab ich wol erhalten und hab auch darauß erfaren daß ihr noch alle gesund seid und ist mir ser lib geweßen. ich wolte noch gern viel mer schreiben aber die zeit will nicht leiden und\n", + "\n", + "Ergebnis 2:\n", + "getreuer Man biß in den Tott. Hertz Libe frau ich bericht dir auch daß ich deinen an mir geschribnen brif von 1. Mey hab ich wol erhalten und hab auch darauß erfaren daß ihr noch alle gesund seid und ist mir ser lib geweßen. ich wolte noch gern viel mer schreiben aber die zeit will nicht leiden und\n" + ] + } + ], + "source": [ + "# 4. Ähnlichkeitsabfragen\n", + "# Beispielabfrage zur Demonstration\n", + "frage = \"Wie berichten die Soldaten über ihre Gesundheit?\"\n", + "ergebnisse = vectorstore.similarity_search(frage, k=2)\n", + "\n", + "for i, dok in enumerate(ergebnisse):\n", + " print(f\"\\nErgebnis {i+1}:\")\n", + " print(dok.page_content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Was passiert hier?\n", + "\n", + "- Texte werden in mathematische Vektoren umgewandelt\n", + "- Diese Vektoren ermöglichen Ähnlichkeitssuche\n", + "- Die Datenbank findet relevante Textstellen" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "C:\\Users\\baumanoa\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\huggingface_hub\\utils\\_deprecation.py:131: FutureWarning: 'post' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.31.0'. Making direct POST requests to the inference server is not supported anymore. Please use task methods instead (e.g. `InferenceClient.chat_completion`). If your use case is not supported, please open an issue in https://github.com/huggingface/huggingface_hub.\n", + " warnings.warn(warning_message, FutureWarning)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Generation mit LLM:\n", + "Frage: Wie berichten die Soldaten über ihre Gesundheit?\n", + "\n", + "Antwort des Modells:\n", + "Die Soldaten berichten, dass sie und ihre Kameraden noch gesund sind. Sie melden, dass sie die Briefe der Angehörigen erhalten haben und davon profitiert haben, zu erfahren, dass ihre Lieben ebenfalls gesund sind. Die Briefe stammen von Man, Jogen Olrich, Haferland, Mein feder Erman und Mein feder Bin. Alle berichten, dass sie sich und ihre Angehörigen gut finden.\n", + "\n", + "Verwendete Quellen:\n", + "- Quelle: getreuer Man biß in den Tott. Hertz Libe frau ich bericht dir auch daß ich deinen an mir geschribnen...\n", + "- Quelle: getreuer Man biß in den Tott. Hertz Libe frau ich bericht dir auch daß ich deinen an mir geschribnen...\n", + "- Quelle: getreuer Man biß in den Tott. Hertz Libe frau ich bericht dir auch daß ich deinen an mir geschribnen...\n", + "- Quelle: si seind noch ale beide gesund. Jogen Olrich und Haferland laßen ihre Eltern auch grüßen und si sind...\n" + ] + } + ], + "source": [ + "# 5. Generation mit einem LLM über HuggingFaceHub\n", + "from langchain_huggingface import HuggingFaceEndpoint\n", + "from langchain.chains import RetrievalQA\n", + "from langchain.prompts import PromptTemplate\n", + "from huggingface_hub import InferenceClient\n", + "\n", + "# Initialisierung des HuggingFace-Modells\n", + "# Sie benötigen einen HuggingFace API-Token, den Sie kostenlos erhalten können\n", + "# https://huggingface.co/settings/tokens\n", + "import os\n", + "os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = \"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\" # Hier Ihren Token eintragen oder als Umgebungsvariable setzen\n", + "\n", + "# Modell auswählen - hier ein mehrsprachiges Modell für deutsche Texte\n", + "client = InferenceClient(model=\"mistralai/Mistral-7B-Instruct-v0.2\")\n", + "llm = HuggingFaceEndpoint(\n", + " client=client,\n", + " repo_id=\"mistralai/Mistral-7B-Instruct-v0.2\",\n", + " temperature=0.5\n", + ")\n", + "\n", + "# Prompt-Template erstellen\n", + "template = \"\"\"\n", + "Du bist ein Historiker, der deutsche Soldatenbriefe untersucht.\n", + "Beantworte die folgende Frage basierend auf dem angegebenen Kontext.\n", + "Verwende nur Informationen aus dem Kontext und keine externen Kenntnisse.\n", + "\n", + "Kontext: {context}\n", + "\n", + "Frage: {question}\n", + "\n", + "Antwort:\n", + "\"\"\"\n", + "prompt = PromptTemplate(template=template, input_variables=[\"context\", \"question\"])\n", + "\n", + "# RAG-Chain erstellen\n", + "qa_chain = RetrievalQA.from_chain_type(\n", + " llm=llm,\n", + " chain_type=\"stuff\", # \"stuff\" kombiniert alle Dokumente in einen Prompt\n", + " retriever=vectorstore.as_retriever(),\n", + " chain_type_kwargs={\"prompt\": prompt},\n", + " return_source_documents=True # Quellen mitliefern\n", + ")\n", + "\n", + "# Frage stellen\n", + "ergebnis = qa_chain.invoke({\"query\": \"Wie berichten die Soldaten über ihre Gesundheit?\"})\n", + "\n", + "print(\"\\nGeneration mit LLM:\")\n", + "print(\"Frage:\", \"Wie berichten die Soldaten über ihre Gesundheit?\")\n", + "print(\"\\nAntwort des Modells:\")\n", + "print(ergebnis[\"result\"])\n", + "print(\"\\nVerwendete Quellen:\")\n", + "for doc in ergebnis[\"source_documents\"]:\n", + " print(f\"- Quelle: {doc.page_content[:100]}...\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Besonderheiten historischer Sprache\n", + "\n", + "### Herausforderung: Historische Orthographie und Sprachgebrauch\n", + "\n", + "In unserem Korpus stoßen wir auf einen wichtigen Aspekt historischer Quellen: die nicht-standardisierte Orthographie und den historischen Sprachgebrauch. Viele Briefe wurden originalgetreu transkribiert und enthalten daher zahlreiche Abweichungen von der modernen Rechtschreibung:\n", + "\n", + "```\n", + "\"Gott zum gruß. Mit Wünschung ales Libes und Gutes. Hertz Liebe Frau, wen ich Dir mit meinen wenigen schreiben noch möchte bey guter Gesundheit andreffen so sol es mir von hertzen lib sein benebst auch Meine Liben Eltern und schwiger Eltern...\"\n", + "```\n", + "\n", + "Diese Eigenheiten stellen keine Fehler dar, sondern sind wertvolle linguistische und historische Daten:\n", + "- Regionale Sprachvarianten\n", + "- Bildungsgrad des Schreibers\n", + "- Zeitliche Entwicklung der Sprache\n", + "- Historische Aussprache\n", + "\n", + "### Auswirkungen auf RAG\n", + "\n", + "Die historische Sprachvariation hat Auswirkungen auf unser RAG-System:\n", + "\n", + "#### 1. Embedding-Modelle\n", + "Modern trainierte Embedding-Modelle können durch historische Schreibweisen herausgefordert werden. Jedoch:\n", + "- Kontextuelle Embedding-Modelle verstehen häufig den Zusammenhang trotz abweichender Schreibweise\n", + "- Die semantische Ähnlichkeit bleibt meist erhalten\n", + "- Multilingual trainierte Modelle sind oft robuster gegenüber Sprachvariationen\n", + "\n", + "#### 2. Retrievalqualität\n", + "Für optimale Ergebnisse:\n", + "- Verwenden wir semantische statt lexikalischer Suche\n", + "- Achten auf Kontext statt einzelne Wörter\n", + "- Profitieren von der Textvektorisierung, die ähnliche Konzepte unabhängig von der exakten Schreibweise erkennt\n", + "\n", + "#### 3. Forschungspotential\n", + "Die originale Schreibweise bietet sogar zusätzliches Forschungspotential:\n", + "- Analyse von Bildungsunterschieden zwischen Rängen\n", + "- Untersuchung regionaler Sprachvarianten\n", + "- Dokumentation historischer Sprachentwicklung\n", + "\n", + "### Beispiel: Robuste Suchanfragen\n", + "\n", + "Mit RAG können wir trotz historischer Schreibweisen effektiv suchen. Wenn wir beispielsweise nach \"Gesundheit\" suchen, werden auch Varianten wie \"Gesundheit\", \"gesuntheit\" oder \"Gesundtheit\" gefunden:\n", + "\n", + "```python\n", + "# Beispielabfrage zur Demonstration\n", + "frage = \"Wie berichten die Soldaten über ihre Gesundheit?\"\n", + "ergebnisse = vectorstore.similarity_search(frage, k=2)\n", + "\n", + "for i, dok in enumerate(ergebnisse):\n", + " print(f\"\\nErgebnis {i+1}:\")\n", + " print(dok.page_content)\n", + "```\n", + "\n", + "Diese Robustheit ist ein wesentlicher Vorteil gegenüber einfachen Stichwortsuchen, die an historischen Schreibvarianten scheitern würden.\n", + "\n", + "### Fazit\n", + "\n", + "Die historische Orthographie in unseren Quellen:\n", + "- Ist kein Hindernis, sondern wertvolles Datenmaterial\n", + "- Wird durch moderne Embedding-Modelle robust verarbeitet\n", + "- Bietet zusätzliche Forschungsperspektiven\n", + "- Demonstriert die Stärke semantischer Suchverfahren" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mögliche Forschungsfragen\n", + "\n", + "Mit dieser Technologie können wir verschiedene historische Fragen untersuchen:\n", + "\n", + "#### Sprachliche Analyse\n", + "- Wie unterscheidet sich die Sprache verschiedener Dienstgrade?\n", + "- Wie formal/informell ist die Kommunikation?\n", + "- Welche zeitspezifischen Ausdrücke werden verwendet?\n", + "\n", + "#### Inhaltliche Analyse\n", + "- Wie wird der Kriegsalltag beschrieben?\n", + "- Welche Rolle spielt die Familie?\n", + "- Wie werden historische Ereignisse reflektiert?\n", + "\n", + "#### Sozialgeschichtliche Perspektiven\n", + "- Welche sozialen Netzwerke werden sichtbar?\n", + "- Wie unterscheiden sich die Lebenswelten?\n", + "- Welche Hierarchien spiegeln sich wider?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Übungen und Reflexion\n", + "\n", + "### Diskussionsfragen\n", + "\n", + "- Welche Vorteile bietet RAG gegenüber klassischer Textsuche?\n", + "- Welche ethischen Aspekte müssen bei der Analyse persönlicher Briefe beachtet werden?\n", + "- Wie könnte die Methode auf andere historische Quellen übertragen werden?\n", + "\n", + "### Praktische Übungen\n", + "\n", + "- Erkunden Sie die Metadaten der Briefe\n", + "- Formulieren Sie eigene Suchanfragen\n", + "- Vergleichen Sie verschiedene Briefe desselben Dienstgrads\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ausblick und nächste Schritte\n", + "In den folgenden Notebooks werden wir:\n", + "- Die einzelnen Komponenten im Detail verstehen\n", + "- Verschiedene Analysestrategien entwickeln\n", + "- Konkrete historische Untersuchungen durchführen" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Weiterführende Ressourcen\n", + "\n", + "- [Deutsches Textarchiv](https://www.deutschestextarchiv.de/)\n", + "- [LangChain Dokumentation](https://python.langchain.com/docs/get_started/introduction)\n", + "- [Neumann, Marko (2019): Soldatenbriefe des 18. und 19. Jahrhunderts](https://www.winter-verlag.de/detail/978-3-8253-4642-3/Neumann_Soldatenbriefe/)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} -- GitLab