-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a very important comment for SentenceSplitter #14257
base: main
Are you sure you want to change the base?
Add a very important comment for SentenceSplitter #14257
Conversation
Shouldn't we just fix the logic? Lol |
@logan-markewich |
@wencan maybe for now we can add a check to see if the problem did occur, and if it did then raise the warning? |
@wencan actually, if you have a test case where this happens, I can probably just work backwards from that |
Yeah this makes sense to me! |
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import Document, MetadataMode
text = """你所描述的情况可能与身体健康有关,尤其是与压力、疲劳和动机相关的身体状态。长时间的工作压力和疲劳可能导致身体功能下降,包括记忆力、注意力和决策能力。此外,焦虑和压力可能会影响你的情绪状态和工作表现,从而形成一个恶性循环。
以下是一些可能与你的情况相关的健康概念:
1\. **慢性疲劳**:长时间的工作和缺乏休息可能导致身体的疲劳,这种慢性疲劳可能会影响你的肌肉恢复和整体健康。
2\. **营养不足**:你提到的对工作的忽视可能导致饮食不规律和营养不足,这可能会影响你的体力和精力。
3\. **体能和耐力**:如果你的工作不再给你提供足够的体能锻炼,或者你感觉自己的体能有所下降,这可能会影响你的工作表现。
4\. **自我照顾**:如果你忽视了对身体的照顾,比如不按时吃饭、不运动,可能会导致身体机能的下降。
5\. **应对策略**:你可能会采取一些应对策略来处理工作压力,比如依赖咖啡或能量饮料来提神,或者熬夜来完成工作。
为了应对这些挑战,你可以尝试以下策略:
\- **休息和恢复**:确保你有足够的休息时间,这对于恢复体力和精神状态至关重要。
\- **时间管理和优先级设定**:尝试合理规划你的时间,优先处理最重要的任务。
\- **寻求支持**:和家人、朋友或同事交流你的感受,或者寻求专业的健康咨询。
\- **自我反思**:思考你的生活方式和工作习惯,以及它们是否对你的健康有益。
\- **健康规划**:考虑你的长期健康规划,是否需要调整你的生活方式或寻求更健康的习惯。
\- **身体保健**:如果可能,尝试一些提高身体机能的活动,如瑜伽、太极或其他健身课程。
记住,你的身体健康是生活的基础。如果工作压力和疲劳影响了你的生活质量,那么采取行动来改变这种状况是至关重要的。专业的健康支持可能会对你有所帮助。"""
doc = Document(text=text, extra_info={
'title': '教育的主要性 教育是人类社会发展的基石',
'keywords': '教育、 文化、 学习、 人才、 成长、 创造、 未来、 资源、 关注、 才华和潜力'
})
# magic: parser = SentenceSplitter(chunk_size=512, chunk_overlap=64, paragraph_separator='\n')
parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
nodes = parser.get_nodes_from_documents([doc])
print([len(node.get_content(MetadataMode.ALL)) for node in nodes])
# output: [441, 537] |
Description
Add a very important comment for SentenceSplitter
Fixes # (issue)
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods