Skip to content

Source Citation v2

Introduction

When a Wikit Semantics APP answers a question, it can cite its sources as numbered references (e.g., [1], [2], [10]) - an option to be enabled in the prompt. This API allows you to retrieve the full metadata of these sources to display them in your user interface.

Benefits:

  • Trace the origin of the information provided
  • Ensure transparency for users
  • Allow verification by accessing source documents

Prerequisites

This documentation assumes that you have already:

  • Executed a query via the /query-executions endpoint
  • Retrieved the queryId from the response

For more information on executing queries, see the dedicated section.

Retrieve sources for a query

Endpoint

GET /semantics/apps/{llm_app_id}/query-execution-sources/{query_id}/get-quoted-sources

Request example

bash
curl -X GET "<https://apis.wikit.ai/semantics/apps/$APP_ID/query-execution-sources/$QUERY_ID/get-quoted-sources>" \\
  -H "accept: application/json" \\
  -H "Wikit-Semantics-API-Key: $API_KEY" \\
  -H "X-Wikit-Organization-Id: $ORG_ID"

ℹ️ The API_KEY is only required for private APPs

The endpoint returns a list of passages identified in the document fragments.

Response example

json
[
  {
    "quoted_as": "1",
    "document": {
      "id": "686e90dd2ff687c57806d107",
      "name": "Remote_Chart_01_08_2023.pdf",
      "title": null,
      "url": null,
      "public_link": false,
      "storage_name": "storage_link",
      "data_source_id": "686d402ce52bbbf7116f55f8",
      "organization_id": "686d3ceb18a3087351f0fa35"
    },
    "chunk": {
      "id": "686e90e725e88b90bdd50eb2",
      "data": "## content of the chunk...",
      "document_id": "686e90dd2ff687c57806d107",
      "data_source_id": "686d402ce52bbbf7116f55f8",
      "llm_app_id": null,
      "user_id": null,
      "flags": null,
      "position": 29,
      "page_start": null,
      "page_end": null
    },
    "data_source": {
      "id": "686d402ce52bbbf7116f55f8",
      "name": "Human Resources"
    },
    "score": 0.88380575,
    "retriever_type": "SEMANTIC_SEARCH"
  }
]

Response structure

The response is an array of source objects, each object containing:

Main fields

FieldTypeDescription
quoted_asstringCitation number in the response (e.g.: "1", "2", "10")
documentobjectSource document metadata
chunkobjectDocument excerpt used as a source
data_sourceobjectInformation about the data source
scorefloatSemantic relevance score (between 0 and 1)
retriever_typestringSearch type used (e.g.: "SEMANTIC_SEARCH")

document object

FieldTypeDescription
idstringUnique document identifier
namestringSource file name
titlestring/nullDocument title (if available)
urlstring/nullDocument URL (if available)
public_linkbooleanIndicates if the document has a public link
storage_namestringInternal storage path
data_source_idstringData source identifier
organization_idstringOrganization identifier

chunk object

FieldTypeDescription
idstringUnique chunk identifier
datastringSource text excerpt (markdown content)
document_idstringReference to the parent document
positionintegerPosition of the chunk in the document
page_startinteger/nullStart page (if applicable)
page_endinteger/nullEnd page (if applicable)

data_source object

FieldTypeDescription
idstringData source identifier
namestringData source name (e.g.: "Human Resources")

Best practices

1. Call timing

Call the sources API after the response streaming has finished

2. Conditional display

Only display the sources section if they exist

3. URL management

Some sources do not have an URL. Handle this case in your UI

4. Conditional display

The API may return the same document multiple times; handle this case according to your needs

5. Conditional display

If you wish, you can hide/transform the [1], [2], ... when displaying the response

6. Common error codes

CodeMeaningSolution
401Invalid API KeyCheck the API key and Organization ID
404Query not foundCheck the queryId or wait a few seconds
500Server errorRetry after a delay

Source Citation v1 (deprecated)

Retrieving source citations for a query

After executing a query, it is possible to extract the citations identified in the sources (i.e., document fragments) used by the LLM app.

POST /semantics/apps/{llm_app_id}/query-executions/{query_execution_id}/citations

bash
curl -X POST "https://apis.wikit.ai/semantics/apps/$SEMANTICS_APP_ID/query-executions/$SEMANTICS_QUERY_EXECUTION_ID/citations" \
  -H "Authorization: Bearer $SEMANTICS_TOKEN" \
  -H "X-Wikit-Organization-Id: $SEMANTICS_ORG_ID" \
  -H "X-Wikit-Response-Format: json" \
  -H "Content-Type: application/json" \
  -d "{}"

The endpoint returns a list of passages identified in the document fragments.

Example:

json
[
    {
        "_id": "668e9bf39bd4493d2557a6ea",
        "created_at": "2024-07-10T14:34:26Z",
        "query_execution_id": "668e9bf09bd4493d2557a6e6",
        "total_count": 1,
        "reply_sentence": "The colors of Wikit's visual identity are: Sky Blue, Midnight Blue, and Violet.",
        "source_sentence": "The colors of Wikit's visual identity are: - Sky Blue; - Midnight Blue; - Violet.",
        "chunk_data_snapshot": " [...]  {{source_sentence}}  [...] ",
        "start_char_idx": 0,
        "end_char_idx": 89,
        "chunk_id": "668e9b94ca8d43490c14ede3",
        "chunk_page_start": 2,
        "chunk_page_end": 2,
        "chunk_position": null,
        "document_id": "668e9b94ca8d43490c14ede0",
        "document_name": "wikit-visual-identity.pdf",
        "document_title": null,
        "document_url": "https://semantics-files.wikit.ai/v1/viewer/NjVkNWIzNDRhMDUxZWEyOGI3ZjU0ZDc2LzY2OGU5YjgzY2E4ZDQzNDkwYzE0ZWRkZi8xNzIwNjIxOTcxLjkwNzU3N193aWtpdF9kb2N1bWVudF9zYW1wbGUucGRm",
        "organization_id": "65d5b344a051ea28b7f54d76",
        "threshold": 50
    }
]

In this example, the fragment associated with page 3 (see the chunk_page_start and chunk_page_end indexes, which start at 0) of the document "wikit-visual-identity.pdf" (see the document_name field) was identified by Wikit Semantics as a source.