Data Source Management

Data Source Overview

The Data Sources page is the central space for managing the platform's knowledge. It displays all the documentary sources that feed your LLM apps, allowing for a global view and simplified management of your knowledge base.

Main Interface

The interface consists of a grid of cards, each representing a distinct data source. At the top of the page, a blue "Create a data source" button allows you to add new sources to your collection.

Data Source Cards

Each data source is represented by a card containing essential information:

General Information

The name of the source appears as the card title
A counter indicates the number of documents contained in the source
The total volume of content is displayed as a word count
A label identifies the type of source (Manual, Connector, etc.)

A three-dot menu (⋮) at the top right of each card provides access to management options:

The "Manage data source" option allows access to settings and content
"Manage connector" offers control over the data import method (for connector-type sources)
"Delete" allows you to remove the source from the workspace

Available Source Types

The platform supports several types of data sources, each adapted to specific needs:

Manual Sources

These sources allow direct addition of documents. They are identified by the "Manual" label and are particularly suitable for internal or specific documents.

Connector Sources

These sources are fed by a connector.

Adding and Configuring Data Sources

Creating a Data Source

The data source creation page allows you to configure a new source that will feed knowledge into your LLM applications. It consists of several essential configuration fields.

Configuration Fields

Name

The name field allows you to identify your data source. This identification is limited to 40 characters to ensure a clear presentation in the interface.

Description

A text field that allows you to describe in detail the use and content of your data source. This description facilitates understanding of the source's role by all users.

Origin

The origin of the source can be defined according to two options:

Manual: For direct addition of documents
Connector: For an automated connection to an external source

Embedding Models

A dropdown list allowing you to select the vectorization model that will be used to process the documents in this source.

Public Link Creation

A toggle switch allowing you to enable or disable the ability to create public links to documents in this source. By default, this option is disabled.

Action Buttons

The page offers two options at the bottom:

Cancel: To abandon creation
Create: To validate the creation of the source with the configured parameters

Data Source Details

The data source details page is organized into three main tabs allowing you to view and manage all aspects of the source: Information, Documents, and Advanced Settings.

Information Tab

The Information tab presents the main characteristics of the data source:

Name: The name identifying the source
Description: The detailed description of the source
Embedding Models: The model used for document analysis (e.g., Azure Ada v2)
Number of documents/words: Content statistics (number of documents and total words)
Origin: The type of document import (Manual or via Connector)
Bucket Name: The technical storage identifier
Public Link Creation: The activation status of public links (Enabled/Disabled)

Documents Tab

The Documents tab allows you to manage the content of the data source:

A search bar to filter documents
An "Import documents" button to add content

Document List

Each document is presented with:

The file name
The import date and time
The file type (e.g., application/pdf)
The word count
The fragment count
An actions menu (⋮)

Advanced Settings Tab

The Advanced Settings tab contains document fragmentation strategies:

Fragmentation Strategies

A table presenting the fragmentation strategy used by Wikit Semantics for each document type:

Microsoft Word documents (*.docx): By character count
PDF documents (*.pdf): By title (Recommended)
Text files (*.txt): By character count
Markdown files (*.md): By title (Recommended)
Web Pages (*.html): By title (Recommended)
JSON files (*.json): By title (Recommended)

Each strategy can be modified via the pencil icon to the right of the table.

This configuration allows optimizing how documents are cut for analysis, based on their format and structure.

Data Source Management ​

Data Source Overview ​

Main Interface ​

Data Source Cards ​

General Information ​

Management Menu ​

Available Source Types ​

Manual Sources ​

Connector Sources ​

Adding and Configuring Data Sources ​

Creating a Data Source ​

Configuration Fields ​

Name ​

Description ​

Origin ​

Embedding Models ​

Public Link Creation ​

Action Buttons ​

Data Source Details ​

Information Tab ​

Documents Tab ​

Toolbar ​

Document List ​

Advanced Settings Tab ​

Fragmentation Strategies ​