T
traeai
Sign in
返回首页
Weaviate • vector database(@weaviate_io)

Weaviate AI Database on X: A user searches for 'caffe crema' in your speciality coffee e-commerce store. The result? 0 matches.

8.5Score
Weaviate AI Database on X: A user searches for 'caffe crema' in your speciality coffee e-commerce store. The result? 0 matches.

TL;DR · AI Summary

Weaviate v1.37 introduces several improvements to address issues with search results due to spelling variations and language-specific stop words.

Key Takeaways

  • Weaviate v1.37 supports per-property accent folding, treating 'caffé' and 'caffe
  • New per-property stopword presets allow precise handling of multilingual descrip
  • A POST /v1/tokenize endpoint is provided to preview BM25 tokenization results.

Outline

Jump quickly between sections.

  1. A user searching for 'caffe crema' returns zero results, indicating a sensitivity issue in BM25 search.

  2. BM25 treats 'caffé' and 'caffe' as different tokens, leading to failed searches.

  3. Weaviate v1.37 introduces three features: accent folding, stopword presets, and a tokenize endpoint.

  4. Schema configuration enables 'caffé' and 'caffe' to match across indexing and querying.

  5. Supports multilingual stopword lists to prevent misfiltering of descriptions.

  6. Allows developers to preview BM25 tokenization, optimizing query performance.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • Weaviate v1.37 新特性
    • Per-property Accent Folding
      • 解决拼写差异问题
      • 提升搜索准确性
    • Per-property Stopword Presets
      • 支持多语言停用词
      • 避免误过滤描述
    • POST /v1/tokenize 端点
      • 预览 BM25 分词结果
      • 优化查询效果

Highlights

Key sentences worth saving and sharing.

  • Your BM25 search just treated 'caffé' and 'caffe' as two different tokens, and the keyword half of your hybrid search dropped to zero.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • One line of schema config and 'caffé' matches 'caffe' everywhere, at index time and query time.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Your French descriptions get a French preset. Same collection.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
#Weaviate#BM25#Vector Database#Text Analysis
Open original article

The result? 0 matches.

Your BM25 search just treated "caffé" and "caffe" as two different tokens, and the keyword half of your hybrid search dropped to zero.

Weaviate v1.37 ships three things to https://t.co/PaaU3c6hOP" / X

Weaviate AI Database on X: "A user searches for "caffe crema" in your speciality coffee e-commerce store. The result? 0 matches. Your BM25 search just treated "caffé" and "caffe" as two different tokens, and the keyword half of your hybrid search dropped to zero. Weaviate v1.37 ships three things to https://t.co/PaaU3c6hOP" / X

Don’t miss what’s happening

Image 2

Weaviate AI Database

@weaviate_io

A user searches for "caffe crema" in your speciality coffee e-commerce store. The result? 0 matches. Your BM25 search just treated "caffé" and "caffe" as two different tokens, and the keyword half of your hybrid search dropped to zero. Weaviate v1.37 ships three things to improve this: 𝗣𝗲𝗿-𝗽𝗿𝗼𝗽𝗲𝗿𝘁𝘆 𝗮𝗰𝗰𝗲𝗻𝘁 𝗳𝗼𝗹𝗱𝗶𝗻𝗴. One line of schema config and "caffé" matches "caffe" everywhere, at index time and query time. 𝗣𝗲𝗿-𝗽𝗿𝗼𝗽𝗲𝗿𝘁𝘆 𝘀𝘁𝗼𝗽𝘄𝗼𝗿𝗱 𝗽𝗿𝗲𝘀𝗲𝘁𝘀. "The North Face" stops getting destroyed by an aggressive English stopword list. Your French descriptions get a French preset. Same collection. 𝗔 𝗣𝗢𝗦𝗧 /𝘃𝟭/𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗲 𝗲𝗻𝗱𝗽𝗼𝗶𝗻𝘁. You hand it text + an analyzer config, you get back the exact tokens BM25 will score. Learn more in our blog: https://weaviate.io/blog/tokenizat ion-text-analysis-weaviate?utm_source=channels&utm_medium=w_social&utm_campaign=1.37_release&utm_content=268019112…

Image 3: Image

1:16 PM · May 15, 2026

·

596 Views

2

5

3

AI may generate inaccurate information. Please verify important content.