Skip to content

Data Sources

Data sources form the knowledge base that powers LLM apps in Wikit Semantics. They represent the set of documents and content that will be used to enrich user interactions.

What is a Data Source?

A data source is an organized collection of documents that serve as a knowledge base for LLM apps. It can contain different types of documents (PDF, Word, text, etc.) which are processed and optimized by the platform for effective use by language models.

Key Features

  • Document Collection: A data source can contain a large number of documents
  • Automated Processing: Each document is automatically analyzed and prepared for optimal use
  • Association Flexibility: An LLM app can be connected to one or more data sources
  • Centralized Management: Single interface to manage all your knowledge bases

Document Processing Workflow

When a document is added to a data source, it goes through several processing steps:

  1. Initial Analysis: Content extraction and normalization
  2. Fragmentation: Breaking down content into optimized fragments (chunks)
  3. Vectorization: Creation of vector representations (embeddings) for each fragment
  4. Indexing: Organization of fragments for efficient searching

Usage in LLM Apps

Data sources play a crucial role in personalizing and ensuring the relevance of your LLM app responses:

  • Provide the necessary context for responses
  • Enable responses based on your specific content
  • Ensure the consistency of shared information
  • Facilitate knowledge updates

Data Source Management

The Wikit Semantics platform offers an intuitive interface to:

  • Create and organize your data sources
  • Add and delete documents
  • Monitor the document processing status
  • Associate sources with LLM apps
  • Maintain and update your knowledge bases

Best Practices

To optimize the use of data sources:

  • Organize your documents thematically
  • Keep your sources up to date
  • Check the quality and relevance of documents
  • Structure your content clearly
  • Regularly monitor the status of your sources

Data sources are a fundamental component of Wikit Semantics, enabling the creation of truly personalized and relevant LLM apps for your specific needs.