Text Embeddings


In [ ]:
import os
from EdgeGPT import Chatbot as Bing, ConversationStyle

bing = Bing(cookiePath = os.path.expanduser('~/.config/EdgeGPT/cookies.json'))

async def ask(prompt):
    res = (await bing.ask(
        prompt = prompt,
        conversation_style = ConversationStyle.balanced,
    ))['item']['messages'][1]

    print(res['text'])
    print('\n---\n')
    print(res['adaptiveCards'][0]['body'][0]['text'])

In [ ]:
await ask('''
text-embedding-ada-002 是什么?
''')
text-embedding-ada-002 是 OpenAI 提供的一种嵌入模型,用于将文本转换为向量表示。[^1^] 你可以将文本字符串发送到嵌入 API 端点,同时选择嵌入模型 ID(例如 text-embedding-ada-002),响应将包含一个嵌入,你可以提取、保存和使用。[^1^] 

text-embedding-ada-002 模型是 OpenAI 提供的第二代嵌入模型之一,它比第一代模型更好、更便宜、更简单易用。[^5^] 

如果你想了解更多关于 text-embedding-ada-002 的信息,可以参考 OpenAI 的官方文档。[^5^]

---

[1]: https://platform.openai.com/docs/guides/embeddings/second-generation-models "Embeddings - OpenAI API"
[2]: https://juejin.cn/post/7216319218505220151 "OpenAI的Embedding - 掘金 - 稀土掘金"
[3]: https://cloud.tencent.com/developer/article/2222888 "GPT-3 vs Bert vs GloVe vs Word2vec 文本嵌入技术的性能对比测试 - 腾讯云开发者社区-腾讯云"
[4]: https://openai.com/blog/new-and-improved-embedding-model/ "New and improved embedding model - openai.com"
[5]: https://platform.openai.com/docs/guides/embeddings "Embeddings - OpenAI API"

text-embedding-ada-002 是 OpenAI 提供的一种嵌入模型,用于将文本转换为向量表示。[^1^][1] 你可以将文本字符串发送到嵌入 API 端点,同时选择嵌入模型 ID(例如 text-embedding-ada-002),响应将包含一个嵌入,你可以提取、保存和使用。[^1^][1] 

text-embedding-ada-002 模型是 OpenAI 提供的第二代嵌入模型之一,它比第一代模型更好、更便宜、更简单易用。[^2^][5] 

如果你想了解更多关于 text-embedding-ada-002 的信息,可以参考 OpenAI 的官方文档。[^2^][5]

In [ ]:
await ask('''
比较 GPT-3 嵌入、GloVe 嵌入、Word2vec 嵌入、MPNet 嵌入
''')
GPT-3 嵌入、GloVe 嵌入、Word2vec 嵌入和 MPNet 嵌入都是常见的文本嵌入模型。其中,GPT-3 是 OpenAI 开发的一种大型语言模型,可以生成自然语言文本[^1^]。GloVe 和 Word2vec 都是基于神经网络的词向量模型,用于将单词转换为向量表示,以便计算机可以更好地理解自然语言[^1^] [^2^]。MPNet 是一种基于多任务学习的预训练语言模型,可以用于各种自然语言处理任务[^1^] [^4^]。

据腾讯云开发者社区的一篇文章所述,该作者使用了 text-embedding-ada-002(GPT-3 的一个 Embeddings)与三种传统文本嵌入技术生成的嵌入进行比较,即 GloVe、Word2vec 和 MPNet。结果显示,GPT-3 嵌入在所有模型中获得了最高的精度[^1^]。但是,这些模型的性能可能会因应用场景而异。如果你需要更多信息,请告诉我!

---

[1]: https://cloud.tencent.com/developer/article/2222888 "GPT-3 vs Bert vs GloVe vs Word2vec 文本嵌入技术的性能对比测试 - 腾讯云开发者社区-腾讯云"
[2]: https://developer.aliyun.com/article/1159268 "GPT-3 vs Bert vs GloVe vs Word2vec 文本嵌入技术的性能对比测试-阿里云开发者社区"
[3]: https://new.qq.com/rain/a/20230212A01IWP00 "GPT-3 vs Bert vs GloVe vs Word2vec 文本嵌入技术的性能对比测试_腾讯新闻"
[4]: https://blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/129153025 "GPT-3 vs Bert vs GloVe vs Word2vec 文本嵌入技术的性能对比测试"
[5]: https://zhuanlan.zhihu.com/p/605603003 "GPT-3 vs Bert vs GloVe 文本嵌入技术的性能对比测试 - 知乎"

GPT-3 嵌入、GloVe 嵌入、Word2vec 嵌入和 MPNet 嵌入都是常见的文本嵌入模型。其中,GPT-3 是 OpenAI 开发的一种大型语言模型,可以生成自然语言文本[^1^][1]。GloVe 和 Word2vec 都是基于神经网络的词向量模型,用于将单词转换为向量表示,以便计算机可以更好地理解自然语言[^1^][1] [^2^][2]。MPNet 是一种基于多任务学习的预训练语言模型,可以用于各种自然语言处理任务[^1^][1] [^3^][4]。

据腾讯云开发者社区的一篇文章所述,该作者使用了 text-embedding-ada-002(GPT-3 的一个 Embeddings)与三种传统文本嵌入技术生成的嵌入进行比较,即 GloVe、Word2vec 和 MPNet。结果显示,GPT-3 嵌入在所有模型中获得了最高的精度[^1^][1]。但是,这些模型的性能可能会因应用场景而异。如果你需要更多信息,请告诉我!

In [ ]:
await ask('''
还有哪些文本嵌入模型?
''')
除了 GPT-3 嵌入、GloVe 嵌入、Word2vec 嵌入和 MPNet 嵌入之外,还有许多其他文本嵌入模型。例如,ELMo、Skip-Thoughts、Quick-Thoughts、InferSent、MILA/MSR 的通用句子表示和 Google 的通用句子编码器等模型都是最先进的模型之一[^1^]。这些模型的性能可能会因应用场景而异。如果你需要更多信息,请告诉我!

---

[1]: https://cloud.tencent.com/developer/article/1167972 "干货 | 文本嵌入的经典模型与最新进展 - 腾讯云开发者社区-腾讯云"
[2]: https://blog.csdn.net/lingpy/article/details/80809926 "文本嵌入的经典模型与最新进展 - CSDN博客"
[3]: https://zhuanlan.zhihu.com/p/459295186 "5分钟 NLP系列—— 11 个词嵌入模型总结 - 知乎 - 知乎专栏"

除了 GPT-3 嵌入、GloVe 嵌入、Word2vec 嵌入和 MPNet 嵌入之外,还有许多其他文本嵌入模型。例如,ELMo、Skip-Thoughts、Quick-Thoughts、InferSent、MILA/MSR 的通用句子表示和 Google 的通用句子编码器等模型都是最先进的模型之一[^1^][1]。这些模型的性能可能会因应用场景而异。如果你需要更多信息,请告诉我!

In [ ]:
await ask('''
文本嵌入模型的英文?
''')
The English term for "文本嵌入模型" is "text embedding models". Some of the most common text embedding models include Word2Vec, GloVe, FastText and Embeddings from Language Models (ELMo)[^1^] [^2^]. If you need more information about these models or any other text embedding models, please let me know!

---

[1]: https://techblog.ezra.com/different-embedding-models-7874197dc410 "What are the common word embeddings? | The Ezra Tech Blog - Medium"
[2]: https://openai.com/blog/introducing-text-and-code-embeddings/ "Introducing text and code embeddings - OpenAI"
[3]: https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings "The Beginner’s Guide to Text Embeddings | deepset"
[4]: https://platform.openai.com/docs/guides/embeddings/embedding-models "Embeddings - OpenAI API"

The English term for "文本嵌入模型" is "text embedding models". Some of the most common text embedding models include Word2Vec, GloVe, FastText and Embeddings from Language Models (ELMo)[^1^][1] [^2^][2]. If you need more information about these models or any other text embedding models, please let me know!

In [ ]:
await ask('''
InstructorEmbedding 是什么?
''')
InstructorEmbedding is a text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning[^1^]. It is a single embedder that can generate text embeddings tailored to different downstream tasks and domains[^1^]. InstructorEmbedding is available as a Python package[^2^]. If you need more information about InstructorEmbedding or any other text embedding models, please let me know!

---

[1]: https://instructor-embedding.github.io/ "Instructor Text Embedding"
[2]: https://pypi.org/project/InstructorEmbedding/ "InstructorEmbedding · PyPI"
[3]: https://python.langchain.com/en/latest/_modules/langchain/embeddings/huggingface.html "langchain.embeddings.huggingface — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.135"

InstructorEmbedding is a text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning[^1^][1]. It is a single embedder that can generate text embeddings tailored to different downstream tasks and domains[^1^][1]. InstructorEmbedding is available as a Python package[^2^][2]. If you need more information about InstructorEmbedding or any other text embedding models, please let me know!

In [ ]:
await ask('''
比较 HuggingFaceInstructEmbeddings 和 HuggingFaceHubEmbeddings
''')
HuggingFaceInstructEmbeddings and HuggingFaceHubEmbeddings are both wrappers around sentence_transformers embedding models[^2^] [^3^]. HuggingFaceInstructEmbeddings is a wrapper around InstructorEmbedding[^2^], while HuggingFaceHubEmbeddings is a wrapper around Hugging Face Hub embedding models[^3^]. To use HuggingFaceInstructEmbeddings, you should have the `sentence_transformers` and `InstructorEmbedding` Python packages installed[^2^]. To use HuggingFaceHubEmbeddings, you should have the `huggingface_hub` Python package installed and the environment variable `HUGGINGFACEHUB_API_TOKEN` set with your API token, or pass it as a named parameter to the constructor[^3^]. If you need more information about HuggingFaceInstructEmbeddings or HuggingFaceHubEmbeddings or any other text embedding models, please let me know!

---

[1]: https://huggingface.co/blog/getting-started-with-embeddings "Getting Started With Embeddings - Hugging Face"
[2]: https://python.langchain.com/en/latest/_modules/langchain/embeddings/huggingface.html "langchain.embeddings.huggingface — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.135"
[3]: https://sj-langchain.readthedocs.io/en/latest/reference/modules/embeddings.html "Embeddings — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.94"
[4]: https://stackoverflow.com/questions/64685243/getting-sentence-embedding-from-huggingface-feature-extraction-pipeline "machine learning - Getting sentence embedding from huggingface Feature ..."

HuggingFaceInstructEmbeddings and HuggingFaceHubEmbeddings are both wrappers around sentence_transformers embedding models[^1^][2] [^2^][3]. HuggingFaceInstructEmbeddings is a wrapper around InstructorEmbedding[^1^][2], while HuggingFaceHubEmbeddings is a wrapper around Hugging Face Hub embedding models[^2^][3]. To use HuggingFaceInstructEmbeddings, you should have the `sentence_transformers` and `InstructorEmbedding` Python packages installed[^1^][2]. To use HuggingFaceHubEmbeddings, you should have the `huggingface_hub` Python package installed and the environment variable `HUGGINGFACEHUB_API_TOKEN` set with your API token, or pass it as a named parameter to the constructor[^2^][3]. If you need more information about HuggingFaceInstructEmbeddings or HuggingFaceHubEmbeddings or any other text embedding models, please let me know!

In [ ]:
await ask('''
sentence_transformers 的 GitHub 链接?
''')
Here is the GitHub link for sentence_transformers: https://github.com/UKPLab/sentence-transformers [^1^] [^5^]. You can find more information about sentence_transformers and how to use it in your project by visiting this link. Let me know if you have any other questions!

---

[1]: https://github.com/taishan1994/sbert_text_similarity "GitHub - taishan1994/sbert_text_similarity: 使用sentence-transformers ..."
[2]: https://github.com/zhangyi24/sentence_transformer_zh "GitHub - zhangyi24/sentence_transformer_zh"
[3]: https://blog.csdn.net/RinaWong/article/details/122890394 "安装sentence-transformers_竹霖聴雨的博客-CSDN博客"
[4]: https://github.com/UKPLab/sentence-transformers "UKPLab/sentence-transformers - Github"
[5]: https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained_models.md "sentence-transformers/pretrained_models.md at master · UKPLab ... - GitHub"
[6]: https://github.com/huggingface/blog/blob/main/how-to-train-sentence-transformers.md "blog/how-to-train-sentence-transformers.md at main - Github"

Here is the GitHub link for sentence_transformers: https://github.com/UKPLab/sentence-transformers [^1^][1] [^2^][5]. You can find more information about sentence_transformers and how to use it in your project by visiting this link. Let me know if you have any other questions!

In [ ]:
await ask('''
InstructorEmbedding 的 GitHub 链接?
''')
Here is the GitHub link for InstructorEmbedding: https://github.com/HKUNLP/instructor-embedding [^1^] [^2^]. You can find more information about InstructorEmbedding and how to use it in your project by visiting this link. Let me know if you have any other questions!

---

[1]: https://pypi.org/project/InstructorEmbedding/ "InstructorEmbedding · PyPI"
[2]: https://github.com/HKUNLP/instructor-embedding "GitHub - HKUNLP/instructor-embedding: One Embedder, Any Task ..."
[3]: https://github.com/basujindal/chatPDF "GitHub - basujindal/chatPDF: chat with PDF"

Here is the GitHub link for InstructorEmbedding: https://github.com/HKUNLP/instructor-embedding [^1^][1] [^2^][2]. You can find more information about InstructorEmbedding and how to use it in your project by visiting this link. Let me know if you have any other questions!

In [ ]:
await ask('''
比较 HuggingFaceInstructEmbeddings 和 LlamaCppEmbeddings
''')
HuggingFaceInstructEmbeddings 和 LlamaCppEmbeddings 都是用于自然语言处理的嵌入式模型。HuggingFaceInstructEmbeddings 是 sentence_transformers 的包装器,而 LlamaCppEmbeddings 是 llama.cpp 嵌入式模型的包装器。这两个模型都可以用于生成文本嵌入,但是它们的实现方式和性能可能有所不同。如果您需要更多的信息,请访问它们的 GitHub 页面:https://github.com/HKUNLP/instructor-embedding [^1^] [^2^] 和 https://github.com/ggerganov/llama.cpp [^3^] 。

---

[1]: https://python.langchain.com/en/latest/_modules/langchain/embeddings/llamacpp.html "langchain.embeddings.llamacpp — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.136"
[2]: https://python.langchain.com/en/latest/_modules/langchain/embeddings/huggingface.html "langchain.embeddings.huggingface — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.135"
[3]: https://sj-langchain.readthedocs.io/en/latest/_modules/langchain/embeddings/huggingface.html "langchain.embeddings.huggingface — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.94"

HuggingFaceInstructEmbeddings 和 LlamaCppEmbeddings 都是用于自然语言处理的嵌入式模型。HuggingFaceInstructEmbeddings 是 sentence_transformers 的包装器,而 LlamaCppEmbeddings 是 llama.cpp 嵌入式模型的包装器。这两个模型都可以用于生成文本嵌入,但是它们的实现方式和性能可能有所不同。如果您需要更多的信息,请访问它们的 GitHub 页面:https://github.com/HKUNLP/instructor-embedding [^1^][1] [^2^][2] 和 https://github.com/ggerganov/llama.cpp [^3^][3] 。

In [ ]:
await ask('''
执行下面这段代码之后,HuggingFaceEmbeddings 的模型下载到哪里了?
```
from langchain.embeddings import HuggingFaceEmbeddings

# 准备文本
text = '这是一个测试文档。'

# 使用 HuggingFace 生成文本嵌入
embeddings = HuggingFaceEmbeddings()
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text])
```
''')
HuggingFaceEmbeddings 的模型下载路径是本地缓存。当您第一次使用该模型时,它会自动缓存在本地。如果您想查看缓存的位置,可以运行以下代码:
```
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.cache_dir)
```
如果您想将模型下载到特定的文件夹中,可以使用以下代码:
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased", cache_dir="my_folder")
```
希望这可以帮助到您!

---

[1]: https://python.langchain.com/en/latest/_modules/langchain/embeddings/huggingface.html "langchain.embeddings.huggingface — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.135"
[2]: https://stackoverflow.com/questions/67595500/how-to-download-model-from-huggingface "How to download model from huggingface? - Stack Overflow"
[3]: https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/huggingface.py "langchain/huggingface.py at master · hwchase17/langchain"

HuggingFaceEmbeddings 的模型下载路径是本地缓存。当您第一次使用该模型时,它会自动缓存在本地。如果您想查看缓存的位置,可以运行以下代码:
```
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.cache_dir)
```
如果您想将模型下载到特定的文件夹中,可以使用以下代码:
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased", cache_dir="my_folder")
```
希望这可以帮助到您!

In [ ]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
dir(tokenizer)
In [ ]:
tokenizer.name_or_path
Out[ ]:
'bert-base-uncased'
In [ ]:
%%bash
ls -lah ~/.cache/huggingface/hub/models--bert-base-uncased
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
total 0
drwxr-xr-x  6 saintway  staff   192B Apr 12 14:43 .
drwxr-xr-x  6 saintway  staff   192B Apr 12 14:43 ..
drwxr-xr-x  3 saintway  staff    96B Apr 12 14:43 .no_exist
drwxr-xr-x  6 saintway  staff   192B Apr 12 14:43 blobs
drwxr-xr-x  3 saintway  staff    96B Apr 12 14:43 refs
drwxr-xr-x  3 saintway  staff    96B Apr 12 14:43 snapshots
In [ ]:
%%bash
du -sh ~/.cache/huggingface/hub/models--bert-base-uncased
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
696K	/Users/saintway/.cache/huggingface/hub/models--bert-base-uncased

If you're talking about vector database embeddings, check out Chroma.


If you're talking about fine tuning, check out alpaca-lora finetuning.

I recommend using gpt4xalpaca as a base model as it doesn't have the same level of censorship as vicuna. However, if you're using it purely for ethical tasks, vicuna is definitely better.

GPT4 x Alpaca

In [ ]:
await ask('''
Chroma Embedding Database 是什么
''')
Chroma Embedding Database 是一个开源的嵌入式数据库,用于存储人工智能模型的嵌入[^1^]。嵌入是向量的集合。Chroma 的数据库专门用于存储 AI 模型的嵌入[^2^]。它允许您通过最近邻搜索而不是子字符串搜索来搜索嵌入[^3^]。默认情况下,Chroma 使用 Sentence Transformers 进行嵌入,但您也可以使用 OpenAI 嵌入、Cohere(多语言)嵌入或自己的嵌入[^3^]。

希望这可以帮助您了解 Chroma Embedding Database。

---

[1]: https://docs.trychroma.com/ "\uD83C\uDFE1 Home | Chroma"
[2]: https://siliconangle.com/2023/04/06/chroma-bags-18m-speed-ai-models-embedding-database/ "Chroma bags $18M to speed up AI models with its embedding database"
[3]: https://pypi.org/project/chromadb/ "chromadb · PyPI"
[4]: https://www.trychroma.com/ "Chroma - the AI-native open-source embedding database"
[5]: https://github.com/chroma-core/chroma "GitHub - chroma-core/chroma: the open source embedding database"

Chroma Embedding Database 是一个开源的嵌入式数据库,用于存储人工智能模型的嵌入[^1^][1]。嵌入是向量的集合。Chroma 的数据库专门用于存储 AI 模型的嵌入[^2^][2]。它允许您通过最近邻搜索而不是子字符串搜索来搜索嵌入[^3^][3]。默认情况下,Chroma 使用 Sentence Transformers 进行嵌入,但您也可以使用 OpenAI 嵌入、Cohere(多语言)嵌入或自己的嵌入[^3^][3]。

希望这可以帮助您了解 Chroma Embedding Database。

In [ ]:
await ask('''
ggml 是什么
''')
ggml 是一个用于机器学习的张量库。该项目正在开发中,尚未准备好用于生产。它是用 C 语言编写的,支持 16 位浮点数和 4 位整数量化。它还支持自动微分(正在进行中)ADAM 和 L ...[^3^]。

希望这可以帮助到您!

---

[1]: https://gitee.com/mirrors/ggml "ggml: ggml 是一个用于机器学习的张量库"
[2]: https://www.oschina.net/p/ggml "ggml首页、文档和下载 - 机器学习的张量库 - OSCHINA - 中文开源技术交流社区"
[3]: https://github.com/ggerganov/ggml "GitHub - ggerganov/ggml: Tensor library for machine learning"
[4]: https://zhuanlan.zhihu.com/p/616037509 "介绍大模型LLaMa及周边 - 知乎 - 知乎专栏"
[5]: https://github.com/ggerganov/whisper.cpp "ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++ - Github"

ggml 是一个用于机器学习的张量库。该项目正在开发中,尚未准备好用于生产。它是用 C 语言编写的,支持 16 位浮点数和 4 位整数量化。它还支持自动微分(正在进行中)ADAM 和 L ...[^1^][3]。

希望这可以帮助到您!

In [ ]:
await ask('''
HUGGINGFACEHUB_API_TOKEN 有什么用?
''')
HUGGINGFACEHUB_API_TOKEN 是一个环境变量,用于验证您的 API 令牌。您可以使用它来访问 Hugging Face Hub API,以便在您的模型中使用预训练模型、上传和下载模型等。[^1^] [^3^]

希望这可以帮助到您!

---

[1]: https://sj-langchain.readthedocs.io/en/latest/ecosystem/huggingface.html "Hugging Face — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.94"
[2]: https://zhuanlan.zhihu.com/p/344173246 "用户身份验证的令牌——Token - 知乎 - 知乎专栏"
[3]: https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_hub.py "langchain/huggingface_hub.py at master · hwchase17/langchain"

HUGGINGFACEHUB_API_TOKEN 是一个环境变量,用于验证您的 API 令牌。您可以使用它来访问 Hugging Face Hub API,以便在您的模型中使用预训练模型、上传和下载模型等。[^1^][1] [^2^][3]

希望这可以帮助到您!

In [ ]:
await ask('''
Hugging Face Hub API 有什么用?
''')
Hugging Face Hub API 是一个模型和数据集的存储库,可以让您轻松地共享、发现和使用模型和数据集。它提供了一组 API,可以让您从 Python 中上传、下载和管理模型和数据集。您可以使用 Hugging Face Hub API 来存储、共享和管理您的模型和数据集,也可以使用它来发现其他人共享的模型和数据集。[^1^]

Hugging Face Hub API 还提供了一些端点,您可以使用这些端点从 Hub 中检索信息,也可以执行某些操作,例如创建模型、数据集或 Space 存储库。[^1^]

---

[1]: https://huggingface.co/docs/hub/api "https://\uE000huggingface\uE001.co/docs/\uE000hub\uE001/\uE000api\uE001"

Hugging Face Hub API 是一个模型和数据集的存储库,可以让您轻松地共享、发现和使用模型和数据集。它提供了一组 API,可以让您从 Python 中上传、下载和管理模型和数据集。您可以使用 Hugging Face Hub API 来存储、共享和管理您的模型和数据集,也可以使用它来发现其他人共享的模型和数据集。[^1^][1]

Hugging Face Hub API 还提供了一些端点,您可以使用这些端点从 Hub 中检索信息,也可以执行某些操作,例如创建模型、数据集或 Space 存储库。[^1^][1]

In [ ]:
await ask('''
没有 HUGGINGFACEHUB_API_TOKEN 就不能使用 Hugging Face Hub API 吗?
''')
是的,Hugging Face Hub API 需要 HUGGINGFACEHUB_API_TOKEN。您需要在 Hugging Face 帐户设置中获取用户访问或 API 令牌。如果您在发送请求到 API 时不提交 API 令牌,则无法在私有模型上运行推理。[^1^] [^3^]

---

[1]: https://api-inference.huggingface.co/docs/python/html/quicktour.html "Overview - Hugging Face"
[2]: https://huggingface.co/docs/huggingface_hub/how-to-inference "Access the Inference API - Hugging Face"
[3]: https://python.langchain.com/en/latest/ecosystem/huggingface.html "Hugging Face — \uD83E\uDD9C\uD83D\uDD17 LangChain 0.0.137"
[4]: https://docs.fast.ai/huggingface.html "fastai - Hugging Face Hub"
[5]: https://zhuanlan.zhihu.com/p/619763221 "HuggingGPT: 用ChatGPT和它的朋友在HuggingFace中解决AI任务 - 知乎"

是的,Hugging Face Hub API 需要 HUGGINGFACEHUB_API_TOKEN。您需要在 Hugging Face 帐户设置中获取用户访问或 API 令牌。如果您在发送请求到 API 时不提交 API 令牌,则无法在私有模型上运行推理。[^1^][1] [^2^][3]

In [ ]:
await ask('''
没有 HUGGINGFACEHUB_API_TOKEN 就不能在 HuggingFace 下载模型吗?
''')
是的,您需要 HUGGINGFACEHUB_API_TOKEN 才能在 HuggingFace 下载模型。如果您在发送请求到 API 时不提交 API 令牌,则无法在私有模型上运行推理。[^1^] [^3^]

---

[1]: https://huggingface.co/docs/hub/security-tokens "User access tokens - Hugging Face"
[2]: https://blog.csdn.net/qq_41895747/article/details/125232359 "解决ValueError: You need to pass a valid `token` or login by using ..."
[3]: https://stackoverflow.com/questions/67595500/how-to-download-model-from-huggingface "How to download model from huggingface? - Stack Overflow"
[4]: https://zhuanlan.zhihu.com/p/475260268 "如何优雅的下载huggingface-transformers模型 - 知乎 - 知乎专栏"
[5]: https://juejin.cn/post/7220623225775095869 "ILLA Cloud- 调用 Hugging Face Inference Endpoints,开启大模型世界之门 - 掘金"

是的,您需要 HUGGINGFACEHUB_API_TOKEN 才能在 HuggingFace 下载模型。如果您在发送请求到 API 时不提交 API 令牌,则无法在私有模型上运行推理。[^1^][1] [^2^][3]

In [ ]:
await ask('''
在模型上运行推理是什么意思?训练模型是什么意思?
''')
在机器学习中,训练模型是指使用大量数据来训练模型,以便它可以自动地从新数据中进行预测。在训练期间,模型会根据输入数据进行调整,以便它可以更好地预测输出。训练完成后,模型就可以用于推理,也就是使用新数据进行预测。

推理是指使用训练好的模型来进行预测。在推理期间,模型会接收新的输入数据,并输出预测结果。推理通常比训练快得多,因为它不需要对模型进行调整。

希望这可以帮到你。如果你有其他问题,请随时问我。

---

在机器学习中,训练模型是指使用大量数据来训练模型,以便它可以自动地从新数据中进行预测。在训练期间,模型会根据输入数据进行调整,以便它可以更好地预测输出。训练完成后,模型就可以用于推理,也就是使用新数据进行预测。

推理是指使用训练好的模型来进行预测。在推理期间,模型会接收新的输入数据,并输出预测结果。推理通常比训练快得多,因为它不需要对模型进行调整。

希望这可以帮到你。如果你有其他问题,请随时问我。

In [ ]:
await ask('''
没有 HUGGINGFACEHUB_API_TOKEN 可以在 HuggingFace 下载他人公开的模型吗?
''')
HuggingFace 下载他人公开的模型不需要 HUGGINGFACEHUB_API_TOKEN。你可以通过以下方式下载他人公开的模型:

1. 通过 Hugging Face 官网下载,例如:https://huggingface.co/models
2. 通过 Transformers 库中的 from_pretrained 方法下载,例如:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
```

希望这可以帮到你。如果你有其他问题,请随时问我。

---

[1]: https://zhuanlan.zhihu.com/p/475260268 "如何优雅的下载huggingface-transformers模型 - 知乎 - 知乎专栏"
[2]: https://huggingface.co/inference-api "Inference API - Hugging Face"
[3]: https://zhuanlan.zhihu.com/p/448852278 "【Huggingface Transformers】保姆级使用教程—上 - 知乎"
[4]: https://pypi.org/project/huggingface-hub/ "huggingface-hub · PyPI"
[5]: https://stackoverflow.com/questions/67595500/how-to-download-model-from-huggingface "How to download model from huggingface? - Stack Overflow"

HuggingFace 下载他人公开的模型不需要 HUGGINGFACEHUB_API_TOKEN。你可以通过以下方式下载他人公开的模型:

1. 通过 Hugging Face 官网下载,例如:https://huggingface.co/models
2. 通过 Transformers 库中的 from_pretrained 方法下载,例如:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
```

希望这可以帮到你。如果你有其他问题,请随时问我。

In [ ]:
await ask('''
怎么用 Python 在 HuggingFace 下载模型?
''')
您可以使用以下代码在 HuggingFace 中下载模型:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
```
这将下载 BERT 模型和 tokenizer。

---

您可以使用以下代码在 HuggingFace 中下载模型:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
```
这将下载 BERT 模型和 tokenizer。

In [ ]:
await ask('''
在 HuggingFace 下载模型,如何断点续传?
''')
您可以使用以下代码在 HuggingFace 中断点续传下载模型:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", resume_download=True)
```
这将下载 BERT 模型和 tokenizer,并在下载过程中断点续传。

---

[1]: https://zhuanlan.zhihu.com/p/475260268 "https://zhuanlan.zhihu.com/p/475260268"

您可以使用以下代码在 HuggingFace 中断点续传下载模型:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", resume_download=True)
```
这将下载 BERT 模型和 tokenizer,并在下载过程中断点续传。

In [ ]:
await ask('''
PyTorch 模型可以分成好几个 bin 文件吗?
''')
PyTorch 模型可以分成多个文件,其中包括模型权重和其他模型参数。如果您的模型是由多个子模型组成的,则可以将每个子模型的权重和参数保存在单独的文件中。这些文件可以使用 PyTorch 的 state_dict 和 optimizer 保存在同一个文件中[^9^]. 

您可以使用以下代码将模型保存到多个文件中:

```python
torch.save(model.state_dict(), PATH)
```

其中,`model` 是您要保存的 PyTorch 模型,`PATH` 是您要保存的文件路径。这将会把模型权重和其他参数保存到一个名为 `PATH` 的文件中。

如果您想要加载这些文件并恢复模型,则可以使用以下代码:

```python
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
```

其中,`TheModelClass` 是您要恢复的 PyTorch 模型类,`*args` 和 `**kwargs` 是用于初始化模型的参数。这将会从名为 `PATH` 的文件中加载模型权重和其他参数,并将它们加载到新创建的模型实例中。

希望这可以帮助您!请让我知道是否还有其他问题需要解答。

---

[1]: https://blog.csdn.net/qq_39686950/article/details/116090826 "Windows系统下把PyTorch模型转为ncnn模型流程 - CSDN博客"
[2]: https://blog.csdn.net/season77us/article/details/104311195 "如何下载Hugging Face 模型(pytorch_model.bin, config.json, vocab.txt)以及如何在 ..."
[3]: https://www.cnblogs.com/cxq1126/p/14277134.html "将tensorflow版本的.ckpt模型转成pytorch的.bin模型 - 最咸的鱼 - 博客园"
[4]: https://zhuanlan.zhihu.com/p/620688513 "Pytorch格式 .pt .pth .bin 详解 - 知乎 - 知乎专栏"
[5]: https://blog.csdn.net/znevegiveup1/article/details/118093294 "读取pytorch.bin权重文件解读_pytorch_model.bin_唐僧爱吃唐僧肉的博客-CSDN博客"
[6]: https://blog.csdn.net/A_water_/article/details/128197563 "Pytorch框架的模型pth文件转换成C++ OpenVINO框架的bin和xml文件并运行_pth转bin_A_water_的博客-CSDN博客"
[7]: https://zhuanlan.zhihu.com/p/113694038 "PyTorch分布式训练简明教程(2022更新版) - 知乎 - 知乎专栏"
[8]: https://zhuanlan.zhihu.com/p/82038049 "PyTorch | 保存和加载模型 - 知乎"
[9]: https://pytorch.org/tutorials/recipes/recipes/saving_multiple_models_in_one_file.html "Saving and loading multiple models in one file using PyTorch"

PyTorch 模型可以分成多个文件,其中包括模型权重和其他模型参数。如果您的模型是由多个子模型组成的,则可以将每个子模型的权重和参数保存在单独的文件中。这些文件可以使用 PyTorch 的 state_dict 和 optimizer 保存在同一个文件中[^1^][9]. 

您可以使用以下代码将模型保存到多个文件中:

```python
torch.save(model.state_dict(), PATH)
```

其中,`model` 是您要保存的 PyTorch 模型,`PATH` 是您要保存的文件路径。这将会把模型权重和其他参数保存到一个名为 `PATH` 的文件中。

如果您想要加载这些文件并恢复模型,则可以使用以下代码:

```python
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
```

其中,`TheModelClass` 是您要恢复的 PyTorch 模型类,`*args` 和 `**kwargs` 是用于初始化模型的参数。这将会从名为 `PATH` 的文件中加载模型权重和其他参数,并将它们加载到新创建的模型实例中。

希望这可以帮助您!请让我知道是否还有其他问题需要解答。

In [ ]:
import os
import pickle
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import LlamaCppEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.faiss import FAISS

def get_docs(dir_name):
    # (1) Import a series of documents.
    loader = DirectoryLoader(dir_name, loader_cls=TextLoader, silent_errors=True)
    raw_documents = loader.load()
    # (2) Split them into small chunks.
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=512,
        chunk_overlap=128,
    )
    return text_splitter.split_documents(raw_documents)

def ingest_docs(dir_name):
    documents = get_docs(dir_name)
    # (3) Create embeddings for each document (using text-embedding-ada-002).
    embeddings = LlamaCppEmbeddings(model_path=os.path.expanduser('~/ggml-model-q4_1.bin'), n_ctx=1024)
    return FAISS.from_documents(documents, embeddings)

vectorstore = ingest_docs('_posts/ultimate-facts')
llama.cpp: loading model from /Users/saintway/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 58231.14 ms /   136 tokens (  428.17 ms per token)
llama_print_timings:        eval time =  1743.42 ms /     1 runs   ( 1743.42 ms per run)
llama_print_timings:       total time = 60005.23 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 178540.67 ms /   472 tokens (  378.26 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 178591.84 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 41458.73 ms /   139 tokens (  298.26 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 41490.46 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 13792.71 ms /    51 tokens (  270.45 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 13806.36 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 64205.43 ms /   192 tokens (  334.40 ms per token)
llama_print_timings:        eval time =   393.32 ms /     1 runs   (  393.32 ms per run)
llama_print_timings:       total time = 64615.16 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 312641.94 ms /   458 tokens (  682.62 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 312700.26 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 428699.27 ms /   478 tokens (  896.86 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 428799.03 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 268943.46 ms /   309 tokens (  870.37 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 268984.75 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 428036.12 ms /   648 tokens (  660.55 ms per token)
llama_print_timings:        eval time =   426.99 ms /     1 runs   (  426.99 ms per run)
llama_print_timings:       total time = 428658.90 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 10771.61 ms /    30 tokens (  359.05 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 10808.14 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 77845.98 ms /   109 tokens (  714.18 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 77920.76 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 31191.36 ms /   103 tokens (  302.83 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 31233.61 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 125583.51 ms /   111 tokens ( 1131.38 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 125624.03 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 26138.27 ms /    68 tokens (  384.39 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 26169.78 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 12869.16 ms /    35 tokens (  367.69 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 12877.02 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 349920.02 ms /   504 tokens (  694.29 ms per token)
llama_print_timings:        eval time =  1198.45 ms /     1 runs   ( 1198.45 ms per run)
llama_print_timings:       total time = 351175.09 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 235384.79 ms /   158 tokens ( 1489.78 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 235445.03 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 540504.96 ms /   542 tokens (  997.24 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 540623.58 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 351235.10 ms /   719 tokens (  488.51 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 351388.48 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 50923.63 ms /   165 tokens (  308.63 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 51015.38 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 184992.04 ms /   272 tokens (  680.12 ms per token)
llama_print_timings:        eval time =   411.72 ms /     1 runs   (  411.72 ms per run)
llama_print_timings:       total time = 185451.72 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 194429.41 ms /   622 tokens (  312.59 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 194464.86 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 113118.77 ms /   366 tokens (  309.07 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 113145.92 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 687028.24 ms /   728 tokens (  943.72 ms per token)
llama_print_timings:        eval time =   439.94 ms /     1 runs   (  439.94 ms per run)
llama_print_timings:       total time = 687586.24 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 189149.69 ms /   533 tokens (  354.88 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 189193.56 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 191291.91 ms /   560 tokens (  341.59 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 191325.52 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 262665.41 ms /   725 tokens (  362.30 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 262699.98 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 446825.63 ms /   744 tokens (  600.57 ms per token)
llama_print_timings:        eval time =  2820.09 ms /     1 runs   ( 2820.09 ms per run)
llama_print_timings:       total time = 449699.31 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 370878.55 ms /   717 tokens (  517.26 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 370985.32 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 316657.40 ms /   647 tokens (  489.42 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 316701.04 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 355381.24 ms /   504 tokens (  705.12 ms per token)
llama_print_timings:        eval time =   426.32 ms /     1 runs   (  426.32 ms per run)
llama_print_timings:       total time = 355866.70 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 306588.42 ms /   559 tokens (  548.46 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 306644.46 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 388570.23 ms /   565 tokens (  687.73 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 388642.65 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 123319.83 ms /   318 tokens (  387.80 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 123356.07 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 329182.49 ms /   811 tokens (  405.90 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 329242.86 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 210174.59 ms /   584 tokens (  359.89 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 210206.57 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 81113.52 ms /   286 tokens (  283.61 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 81134.54 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 439253.03 ms /   727 tokens (  604.20 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 439387.22 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 286354.39 ms /   751 tokens (  381.30 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 286420.33 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 444201.68 ms /   675 tokens (  658.08 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 444267.37 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 607155.44 ms /   620 tokens (  979.28 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 607229.15 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 674110.08 ms /   708 tokens (  952.13 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 674200.48 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 329078.28 ms /   470 tokens (  700.17 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 329136.29 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 774982.77 ms /   734 tokens ( 1055.83 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 775084.90 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1736656.90 ms /   767 tokens ( 2264.22 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1736891.33 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1348035.02 ms /   608 tokens ( 2217.16 ms per token)
llama_print_timings:        eval time = 24584.54 ms /     1 runs   (24584.54 ms per run)
llama_print_timings:       total time = 1372887.22 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1395140.63 ms /   472 tokens ( 2955.81 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1395354.86 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2404735.92 ms /   808 tokens ( 2976.16 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2405157.08 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1788561.84 ms /   640 tokens ( 2794.63 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1788744.43 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2043317.53 ms /   690 tokens ( 2961.33 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2043549.39 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1295899.29 ms /   402 tokens ( 3223.63 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1296069.46 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1882245.78 ms /   592 tokens ( 3179.47 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1882491.04 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 2343689.96 ms /   741 tokens ( 3162.87 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 2343993.29 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1234739.82 ms /   490 tokens ( 2519.88 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1234957.50 ms

llama_print_timings:        load time =  8850.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 86554.06 ms /    53 tokens ( 1633.10 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 86652.29 ms
In [ ]:
import pickle
# Save vectorstore
with open('vectorstore_13B_1024.pkl', 'wb') as f:
    pickle.dump(vectorstore, f)
In [ ]:
import pickle
# Load vectorstore
with open('vectorstore_13B_1024.pkl', 'rb') as f:
    vectorstore = pickle.load(f)
llama.cpp: loading model from /Users/saintway/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
In [ ]:
question = '你知道什么?'
In [ ]:
# Get context related to the question from the embedding model
for context in vectorstore.similarity_search(question):
    print(f'{context}\n')
page_content='> 现在我们知道,事实上应当承认在主观的质和推断出来的客观世界之间有一种确切规定的、一义的配列关系。大量的经验材料告诉我们,我们可以发现,至少必须假设与所有经验唯一地联系着的“物理的”过程的存在。没有什么意识的质不可能受到作用于身体的力的影响。的确,我们甚至能够用一种简单的物理方法,例如吸进一种气体,就把意识全部消除掉。我们的行动与我们的意志经验相联系,幻觉与身体的疲惫相联系,抑郁症的发作与消化的紊乱相联系。为了研究这类相互联系,心的理论必须抛弃纯粹内省的方法而成为**生理的**心理学。只有这个学科才能在理论上达到对心理的东西的完全的知识。借助于这样一种心理学,我们就可以用概念和所与的主观的质相配列,正如我们能够用概念与推论出来的客观的质相配列一样。这样,主观的质就像客观的质一样成为可知的了。' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

page_content='态度:认同、欣赏、尊重、重视、轻视、忽视、鄙视、反对\n客体所做:我对你的态度、你对我的态度、你对他的态度\n主体所想:我对你的态度、你对我的态度\n主体所说:我对他的态度、他对我的态度、他对你的态度\n对于某些次态度的记忆、拟构;怀疑、与、确信;\n你对我的态度、有正有误;作为某种主体效果的客体态度,对于客体态度的目标、意向;对于态度的态度、有正有误;\n渴望被重视、被认可、被理解、被公正地对待;\n虚荣;自负,轻视,反对、有正有误,对于他人的误解;对于自己态度的温和的怀疑;苏格拉底式教学法;偏见、综合比较,是,某种轻视;\n我对客体的态度:你对我的态度、你的确信,我的确信、我对你的态度;确信、对于确信的态度;我对自己的态度,耻辱之恨、丑陋之恨\n对于某种经历的认同。对于某种人生概括的认同。省略主语、所有格。怕丑。\n注意,是,自由的。?我是谁?' metadata={'source': '_posts/ultimate-facts/终极真实.md'}

page_content='> 我们刚刚知道自然科学借以掌握质的方法––形成量的概念的方法。我们必须提出的问题是,这种方法是不是也能够适用于主观的意识的质。按照我们前面所说,为了使这种方法能够加以运用,必须有与这些质充分确定地、唯一地联系着的空间变化。如果情况真的如此,那么这个问题就可以通过空间–时间的重合方法来解决,因而**测量**便是可能的。但是,这种重合的方法本质上就是进行物理的观察,而就内省法来说,却不存在物理的观察这种事情。由此立刻就可以得出结论:心理学沿着内省的途径决不可能达到知识的理想。因此,它必须尽量使用物理的观察方法来达到它的目的。但这是不是可能的呢?是不是有依存于意识的质的空间变化,就像例如在光学中干涉带的宽度依存于颜色,在电学中磁铁的偏转度依存于磁场的强度那样呢?' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

page_content='---\n\n> \u3000\u3000有些权威认为,有必要把意识的内容 (content) 与“有意识状态的特性” (quality of being conscious) 或“意识本身” (consciousness as such) 区分开来²。这一划分与我的分类异曲同工。\n\u3000\u3000要想产生意识,必须先具备某些神经前提条件。我把这些条件称为 NCC_e。任一特定知觉的 NCC 都是局部作用的、高度特化的、转瞬即逝的,相比起来,NCC_e 的作用方式更全局化也更持久。要是没有相关的 NCC_e 的话,机体或许也还能有简单的行为,但在这样做时绝不会有意识(可能发生这种情形的某些病理条件将在第13章讨论)。根据定义可知,如果没有 NCC_e,就不可能形成任何 NCC。\n\u3000\u3000会不会有这样一种状态,即生物体虽然有意识,却意识不到任何具体内容?换句话说,NCC_e 能否脱离 NCC 而单独存在呢?某些冥想的目标就是要进入这种没有具体内容的意识形式³。但是在目前,还很难对它进行严格的分析。' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

llama_print_timings:        load time = 15863.30 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 15862.61 ms /     8 tokens ( 1982.83 ms per token)
llama_print_timings:        eval time = 23200.83 ms /     1 runs   (23200.83 ms per run)
llama_print_timings:       total time = 39067.66 ms
In [ ]:
from langchain.chains.llm import LLMChain
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores.base import VectorStore
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Callback function to stream answers to stdout.
manager = CallbackManager([StreamingStdOutCallbackHandler()])

streaming_llm = ChatOpenAI(streaming=True, callback_manager=manager, verbose=True, temperature=0)
question_gen_llm = ChatOpenAI(temperature=0, verbose=True, callback_manager=manager)
# Prompt to generate independent questions by incorporating chat history and a new question.
question_generator = LLMChain(llm=question_gen_llm, prompt=CONDENSE_QUESTION_PROMPT)
# Pass in documents and a standalone prompt to answer questions.
doc_chain = load_qa_chain(streaming_llm, chain_type='stuff', prompt=QA_PROMPT)
# Generate prompts from embedding model.
qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)
In [ ]:
QA_PROMPT
Out[ ]:
PromptTemplate(input_variables=['context', 'question'], output_parser=None, partial_variables={}, template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:", template_format='f-string', validate_template=True)

Utilizing iPyMock

在 PyTest 中,可以使用 monkeypatch 来替换导入的库,从而进行测试。 monkeypatch 是一个 pytest 内置的 fixture,用于替换运行时的变量和对象,以便在测试期间使用自定义值。

对于要替换的库,可以使用 pytest_mock fixture 来自动将其传递给 monkeypatch。如果没有使用 pytest_mock,则需要手动使用 monkeypatch.setattr() 方法来替换导入的库。

以下是一个示例,假设我们有一个名为 example.py 的模块,它导入了 requests 库,并使用了该库的 get() 方法发送网络请求:

import requests

def get_example_data():
    response = requests.get('https://example.com')
    return response.content

要测试这个函数,我们需要使用 Dummy 对象替换 requests 库,以便我们可以模拟网络请求的响应。可以使用以下代码进行测试:

import pytest
import example

class DummyResponse:
    def __init__(self, content):
        self.content = content

@pytest.fixture
def mock_requests(monkeypatch):
    def mock_get(*args, **kwargs):
        return DummyResponse(b'Test data')
    monkeypatch.setattr(requests, 'get', mock_get)

def test_get_example_data(mock_requests):
    data = example.get_example_data()
    assert data == b'Test data'

在这个示例中,我们首先定义了一个名为 DummyResponse 的类,它代表了一个虚拟的 requests 库的响应。然后我们定义了一个名为 mock_requests 的 fixture,它使用 monkeypatch.setattr() 方法替换了 requests 库的 get() 方法,使其返回我们定义的 DummyResponse 对象。

最后,我们定义了一个测试函数 test_get_example_data,并将 mock_requests fixture 作为参数传递给它。在测试函数内部,我们调用 example.get_example_data(),这将调用我们已经用 Dummy 对象替换的 requests 库的 get() 方法,最终返回我们定义的虚拟响应。

这样我们就可以在测试中使用 Dummy 对象替换任何库,以便更好地控制测试环境。

In [ ]:
import openai
In [ ]:
help(openai.ChatCompletion.create)
Help on method create in module openai.api_resources.chat_completion:

create(*args, **kwargs) method of builtins.type instance
    Creates a new chat completion for the provided messages and parameters.
    
    See https://platform.openai.com/docs/api-reference/chat-completions/create
    for a list of valid parameters.

In [ ]:
import json, os
from revChatGPT.V1 import Chatbot, configure

# Open the JSON file and read the conversation_id
with open(os.path.expanduser('~/.config/revChatGPT/config.json'), 'r') as f:
    conversation_id = json.load(f).get('conversation_id', None)

bot = Chatbot(
    config = configure(),
    conversation_id = conversation_id,
    lazy_loading = True
)
In [ ]:
%%bash
pip install --upgrade git+https://github.com/seii-saintway/ipymock
In [ ]:
import pytest
In [ ]:
import markdown
import IPython
In [ ]:
def delta(prompt):
    res = ''
    for response in bot.ask(prompt):
        # IPython.display.display(IPython.core.display.Markdown(response['message']))
        # IPython.display.clear_output(wait=True)
        yield {
            'choices': [
                {
                    'index': 0,
                    'delta': {
                        'content': response['message'][len(res):],
                    }
                }
            ],
        }
        res = response['message']

def mock_create(*args, **kwargs):
    for message in kwargs['messages']:
        if message['role'] == 'user':
            break
    else:
        return {
            'choices': [{}],
        }

    if kwargs.get('stream', False):
        return delta(message['content'])

    for response in bot.ask(message['content']):
        # IPython.display.display(IPython.core.display.Markdown(response['message']))
        # IPython.display.clear_output(wait=True)
        pass
    return {
        'choices': [
            {
                'finish_reason': 'stop',
                'index': 0,
                'message': {
                    'content': response['message'],
                    'role': 'assistant',
                }
            }
        ],
    }
In [ ]:
@pytest.fixture
def mock_openai(monkeypatch):
    monkeypatch.setattr(openai.ChatCompletion, 'create', mock_create)
In [ ]:
question = '终极真实是什么?'
In [ ]:
answer = {}
In [ ]:
def test_qa(mock_openai):
    global answer
    answer = qa({'question': question, 'chat_history': []})
    print('\n')
    assert isinstance(answer, dict)
In [ ]:
from ipymock import do
In [ ]:
do(
    mock_openai=mock_openai,
    test_qa=test_qa,
)
=> no.0  ::source::test_qa  setup  passed

llama_print_timings:        load time = 14550.38 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 23539.97 ms /    15 tokens ( 1569.33 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 23551.68 ms
The author argues that to study the interconnections between subjective experiences and physical processes, we need to move away from purely introspective methods and embrace physiological psychology. Only through this approach can we achieve a complete understanding of psychological phenomena. The author also suggests that consciousness is inextricably linked to physical processes and can be affected by them.

=> no.0  ::source::test_qa  runtest  passed

In [ ]:
answer
Out[ ]:
{'question': '终极真实是什么?',
 'chat_history': [],
 'answer': 'The author argues that to study the interconnections between subjective experiences and physical processes, we need to move away from purely introspective methods and embrace physiological psychology. Only through this approach can we achieve a complete understanding of psychological phenomena. The author also suggests that consciousness is inextricably linked to physical processes and can be affected by them.'}
  • Chat with Open Large Language Models
    • Vicuna: a chat assistant fine-tuned from LLaMA on user-shared conversations. This one is expected to perform best according to our evaluation.
    • Koala: a chatbot fine-tuned from LLaMA on user-shared conversations and open-source datasets. This one performs similarly to Vicuna.
    • ChatGLM: an open bilingual dialogue language model | 开源双语对话语言模型
    • Alpaca: a model fine-tuned from LLaMA on 52K instruction-following demonstrations.
    • LLaMA: open and efficient foundation language models

I recommend using gpt4 x alpaca ggml as a base model as it doesn't have the same level of censorship as vicuna. However, if you're using it purely for ethical tasks, vicuna is definitely better.


In [ ]:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Pi3141/alpaca-lora-7B-ggml', filename='ggml-model-q4_0.bin', resume_download=True)
Downloading ggml-model-q4_0.bin:   0%|          | 0.00/4.21G [00:00<?, ?B/s]
Out[ ]:
'/Users/saintway/.cache/huggingface/hub/models--Pi3141--alpaca-lora-7B-ggml/snapshots/fec53813efae6495f9b1f14aa4dedffc07bbf2e0/ggml-model-q4_0.bin'
In [ ]:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Pi3141/alpaca-lora-7B-ggml', filename='ggml-model-q4_1.bin', resume_download=True)
Downloading ggml-model-q4_1.bin:   0%|          | 0.00/5.06G [00:00<?, ?B/s]
Out[ ]:
'/Users/saintway/.cache/huggingface/hub/models--Pi3141--alpaca-lora-7B-ggml/snapshots/fec53813efae6495f9b1f14aa4dedffc07bbf2e0/ggml-model-q4_1.bin'

In [ ]:
%%bash
pip install llama-cpp-python[server]
In [ ]:
%%bash
export MODEL=~/ggml-model-q4_1.bin
python3 -m llama_cpp.server
llama.cpp: loading model from /Users/saintway/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
INFO:     Started server process [41176]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:     ::1:61438 - "GET / HTTP/1.1" 404 Not Found
INFO:     ::1:61438 - "GET /docs HTTP/1.1" 200 OK
INFO:     ::1:61438 - "GET /openapi.json HTTP/1.1" 200 OK
llama_print_timings:        load time = 11679.89 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 240847.82 ms /   478 tokens (  503.87 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 240873.74 ms
INFO:     ::1:61439 - "POST /v1/embeddings HTTP/1.1" 200 OK
llama_print_timings:        load time = 11679.89 ms
llama_print_timings:      sample time =    33.43 ms /    16 runs   (    2.09 ms per run)
llama_print_timings: prompt eval time = 255305.21 ms /   478 tokens (  534.11 ms per token)
llama_print_timings:        eval time = 28069.18 ms /    15 runs   ( 1871.28 ms per run)
llama_print_timings:       total time = 283470.31 ms
INFO:     ::1:61681 - "POST /v1/completions HTTP/1.1" 200 OK

In [ ]:
import os
from llama_cpp import Llama
llm = Llama(model_path=os.path.expanduser('~/ggml-model-q4_1.bin'))
llama.cpp: loading model from /Users/saintway/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11359.03 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  =  800.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
In [ ]:
llm.tokenize('''> 我们刚刚知道自然科学借以掌握质的方法––形成量的概念的方法。我们必须提出的问题是,这种方法是不是也能够适用于主观的意识的质。按照我们前面所说,为了使这种方法能够加以运用,必须有与这些质充分确定地、唯一地联系着的空间变化。如果情况真的如此,那么这个问题就可以通过空间–时间的重合方法来解决,因而**测量**便是可能的。但是,这种重合的方法本质上就是进行物理的观察,而就内省法来说,却不存在物理的观察这种事情。由此立刻就可以得出结论:心理学沿着内省的途径决不可能达到知识的理想。因此,它必须尽量使用物理的观察方法来达到它的目的。但这是不是可能的呢?是不是有依存于意识的质的空间变化,就像例如在光学中干涉带的宽度依存于颜色,在电学中磁铁的偏转度依存于磁场的强度那样呢?'''.encode('utf8'))
In [ ]:
llm.tokenize('''> 现在我们知道,事实上应当承认在主观的质和推断出来的客观世界之间有一种确切规定的、一义的配列关系。大量的经验材料告诉我们,我们可以发现,至少必须假设与所有经验唯一地联系着的“物理的”过程的存在。没有什么意识的质不可能受到作用于身体的力的影响。的确,我们甚至能够用一种简单的物理方法,例如吸进一种气体,就把意识全部消除掉。我们的行动与我们的意志经验相联系,幻觉与身体的疲惫相联系,抑郁症的发作与消化的紊乱相联系。为了研究这类相互联系,心的理论必须抛弃纯粹内省的方法而成为**生理的**心理学。只有这个学科才能在理论上达到对心理的东西的完全的知识。借助于这样一种心理学,我们就可以用概念和所与的主观的质相配列,正如我们能够用概念与推论出来的客观的质相配列一样。这样,主观的质就像客观的质一样成为可知的了。'''.encode('utf8'))

In [ ]:
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
In [ ]:
def get_docs(dir_name):
    # (1) Import a series of documents.
    loader = DirectoryLoader(dir_name, loader_cls=TextLoader, silent_errors=True)
    raw_documents = loader.load()
    # (2) Split them into small chunks.
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1024,
        chunk_overlap=64,
    )
    return text_splitter.split_documents(raw_documents)

In [ ]:
import os
from langchain.embeddings import LlamaCppEmbeddings
from langchain.vectorstores.faiss import FAISS
In [ ]:
def ingest_docs(dir_name):
    documents = get_docs(dir_name)
    # (3) Create embeddings for each document (using text-embedding-ada-002).
    embeddings = LlamaCppEmbeddings(model_path=os.path.expanduser(
        '~/.cache/huggingface/hub/models--Pi3141--alpaca-lora-7B-ggml/snapshots/fec53813efae6495f9b1f14aa4dedffc07bbf2e0/ggml-model-q4_1.bin'
    ), n_ctx=2048)
    return FAISS.from_documents(documents, embeddings)

vectorstore = ingest_docs('_posts/ultimate-facts')
llama.cpp: loading model from /Users/saintway/.cache/huggingface/hub/models--Pi3141--alpaca-lora-7B-ggml/snapshots/fec53813efae6495f9b1f14aa4dedffc07bbf2e0/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  = 2048.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 121567.27 ms /   607 tokens (  200.28 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 121578.08 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 55477.99 ms /   384 tokens (  144.47 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 55490.11 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 185983.50 ms /  1245 tokens (  149.38 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 185993.29 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 99895.20 ms /   648 tokens (  154.16 ms per token)
llama_print_timings:        eval time =   274.67 ms /     1 runs   (  274.67 ms per run)
llama_print_timings:       total time = 100174.12 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =  4480.73 ms /    30 tokens (  149.36 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time =  4483.89 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 34421.56 ms /   208 tokens (  165.49 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 34428.46 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 18916.80 ms /   115 tokens (  164.49 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 18922.95 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 109097.79 ms /   672 tokens (  162.35 ms per token)
llama_print_timings:        eval time =   322.11 ms /     1 runs   (  322.11 ms per run)
llama_print_timings:       total time = 109426.49 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 182994.18 ms /  1131 tokens (  161.80 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 183004.98 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 159320.01 ms /  1040 tokens (  153.19 ms per token)
llama_print_timings:        eval time =   239.12 ms /     1 runs   (  239.12 ms per run)
llama_print_timings:       total time = 159568.61 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 239876.70 ms /  1530 tokens (  156.78 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 239888.82 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 217347.31 ms /  1428 tokens (  152.20 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 217358.50 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 191811.75 ms /  1255 tokens (  152.84 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 191821.81 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 226394.03 ms /  1406 tokens (  161.02 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 226403.94 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 250274.44 ms /  1514 tokens (  165.31 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 250290.90 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 245848.18 ms /  1459 tokens (  168.50 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 245869.09 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 241036.46 ms /  1448 tokens (  166.46 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 241046.69 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 90638.96 ms /   549 tokens (  165.10 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 90648.73 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 243383.44 ms /  1456 tokens (  167.16 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 243395.79 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 66549.32 ms /   407 tokens (  163.51 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 66557.07 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 254780.55 ms /  1524 tokens (  167.18 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 254791.22 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 63349.78 ms /   397 tokens (  159.57 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 63354.36 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 174274.05 ms /  1092 tokens (  159.59 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 174282.70 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 194346.63 ms /  1332 tokens (  145.91 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 194358.84 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 69352.89 ms /   490 tokens (  141.54 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 69357.43 ms

llama_print_timings:        load time =  9764.51 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =  7361.54 ms /    53 tokens (  138.90 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time =  7371.77 ms

I need a big memory to accelerate LLM inference.

In [ ]:
import pickle
In [ ]:
# Save vectorstore
with open('vectorstore_7B_2048.pkl', 'wb') as f:
    pickle.dump(vectorstore, f)
In [ ]:
# Load vectorstore
with open('vectorstore_7B_2048.pkl', 'rb') as f:
    vectorstore = pickle.load(f)
llama.cpp: loading model from /Users/saintway/.cache/huggingface/hub/models--Pi3141--alpaca-lora-7B-ggml/snapshots/fec53813efae6495f9b1f14aa4dedffc07bbf2e0/ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 3
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  = 2048.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

In [ ]:
question = '你知道什么?'
In [ ]:
# Get context related to the question from the embedding model
for context in vectorstore.similarity_search(question):
    print(f'{context}\n')
page_content='title: Neuroscience\ndate: 2021-10-14 16:30:20\ntags: Neuro\n---\n\nThe [**ventral tegmental area**](https://en.wikipedia.org/wiki/Ventral_tegmental_area) (**VTA**) (**tegmentum** is Latin for covering), also known as the **ventral tegmental area of Tsai**, or simply **ventral tegmentum**, is a group of neurons located close to the midline on the floor of the midbrain.\n\n---\n\n> \u3000\u3000有些权威认为,有必要把意识的内容 (content) 与“有意识状态的特性” (quality of being conscious) 或“意识本身” (consciousness as such) 区分开来²。这一划分与我的分类异曲同工。\n\u3000\u3000要想产生意识,必须先具备某些神经前提条件。我把这些条件称为 NCC_e。任一特定知觉的 NCC 都是局部作用的、高度特化的、转瞬即逝的,相比起来,NCC_e 的作用方式更全局化也更持久。要是没有相关的 NCC_e 的话,机体或许也还能有简单的行为,但在这样做时绝不会有意识(可能发生这种情形的某些病理条件将在第13章讨论)。根据定义可知,如果没有 NCC_e,就不可能形成任何 NCC。\n\u3000\u3000会不会有这样一种状态,即生物体虽然有意识,却意识不到任何具体内容?换句话说,NCC_e 能否脱离 NCC 而单独存在呢?某些冥想的目标就是要进入这种没有具体内容的意识形式³。但是在目前,还很难对它进行严格的分析。' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

page_content='有意注意,是指,对于某次效果的注意。\n无意注意,是指,对于某次非效果的注意。\n\n目标,是指,对于某种效果的某些次记忆所联结着的对于此种效果的拟构。\n意向,是指,对于某些种效果的某些次记忆所联结着的对于某种效果的拟构。\n\n懊悔,是指,对于某次弊害效果的某次记忆、对于某次功效的某次记忆所联结着的对于某次功效的拟构。\n焦虑,是指,对于某次弊害效果的某次记忆、对于某次功效的某次意向所联结着的对于某次弊害效果的拟构。\n\n对于某次功效的目标,联结着,对于此次功效的原因。\n对于某种功效的概括,联结着,对于此种功效的原因。\n\n兴趣,是指,联结着某次快乐的识。\n荒诞,是指,联结着某次乏味的识。\n苦毒,是指,联结着某次痛苦的识。\n\n慾望,是指,对于某次兴趣的表征。\n妄想,是指,对于某次荒诞的表征。?\n苦观,是指,对于某次苦毒的表征。\n\n苦观,分为,记忆苦观、拟构苦观。弊害,…、…\n\n有趣注意,是指,对于某次兴趣的注意。\n无趣注意,是指,对于某次荒诞的注意。\n\n意义,是指,值得的注意。\n神圣,是指,极其丰富的意义。\n积极的态度,是指,充满对于某种意义的信心。\n消极的态度,是指,缺乏对于某种意义的信心。\n积极的注意,导致着,快乐。\n消极的注意,导致着,乏味。\n对于某种意义的怀疑,是指,对于某种意义的信心的减弱。\n对于某种意义的确定,是指,对于某种意义的信心的增强。\n对于某种意义的静思,是指,对于某种意义的减弱。对于某种意义的静思,导致着,忧郁。\n对于某种意义的禅修,是指,对于某种意义的增强。对于某种意义的禅修,导致着,幸福。\n静思、禅修、祷告,都是,某种定觉练习。\n\n---\n\n> 因为我们得了救是因着盼望。只是所盼望的若已得看见,便不是盼望了;因为人所看见的、他何必还盼望呢?但我们若盼望所未看见的,就必坚忍切候着。\n(罗马书 8:24-25 吕振中)\n\n> 所以青春性的私欲、你总要逃避;你要跟那些用洁净心呼求主的人一同追求正义、忠信、仁爱、和平。\n(提摩太后书 2:22 吕振中)\n\n向内往最深处去:净心、呼求主名、并且、等待回应。' metadata={'source': '_posts/ultimate-facts/终极真实.md'}

page_content='> 我们刚刚知道自然科学借以掌握质的方法––形成量的概念的方法。我们必须提出的问题是,这种方法是不是也能够适用于主观的意识的质。按照我们前面所说,为了使这种方法能够加以运用,必须有与这些质充分确定地、唯一地联系着的空间变化。如果情况真的如此,那么这个问题就可以通过空间–时间的重合方法来解决,因而**测量**便是可能的。但是,这种重合的方法本质上就是进行物理的观察,而就内省法来说,却不存在物理的观察这种事情。由此立刻就可以得出结论:心理学沿着内省的途径决不可能达到知识的理想。因此,它必须尽量使用物理的观察方法来达到它的目的。但这是不是可能的呢?是不是有依存于意识的质的空间变化,就像例如在光学中干涉带的宽度依存于颜色,在电学中磁铁的偏转度依存于磁场的强度那样呢?\n> 现在我们知道,事实上应当承认在主观的质和推断出来的客观世界之间有一种确切规定的、一义的配列关系。大量的经验材料告诉我们,我们可以发现,至少必须假设与所有经验唯一地联系着的“物理的”过程的存在。没有什么意识的质不可能受到作用于身体的力的影响。的确,我们甚至能够用一种简单的物理方法,例如吸进一种气体,就把意识全部消除掉。我们的行动与我们的意志经验相联系,幻觉与身体的疲惫相联系,抑郁症的发作与消化的紊乱相联系。为了研究这类相互联系,心的理论必须抛弃纯粹内省的方法而成为**生理的**心理学。只有这个学科才能在理论上达到对心理的东西的完全的知识。借助于这样一种心理学,我们就可以用概念和所与的主观的质相配列,正如我们能够用概念与推论出来的客观的质相配列一样。这样,主观的质就像客观的质一样成为可知的了。\n> 我们很早就指出,客观世界中最直接地与自我的主观的质相联系的部分就是由大脑的概念,特别是大脑皮层的概念所表示的那一部分。因而在科学知识的精确的世界图景中,可用数值描述的概念代替的主观质的,只是某些大脑过程。相互依存的分析不可避免要引向这些大脑过程。虽然我们还远没有确切地知道所涉及的是何种个别的过程,但至少指出了一条途径:必须以大脑过程来代替主观的质。这就是我们能够充分认识主观的质所具有的唯一的希望。\n> ……' metadata={'source': '_posts/ultimate-facts/Neuroscience.md'}

page_content='客体方式,导致着、联结着,主体方式、机体状态\n形体,导致着、联结着,身体、快乐、痛苦\n轻蔑、轻视他人对自己的态度,损害着,羞耻心\n羞耻,对于亲密程度的重视;我们在争辩的时候,真正损害着羞耻心的,实际上是,轻视他人对自己的态度,而不是,轻视他人的(由父所创造的)信念?\n羞耻、光荣,重视他人对自己的态度、敬重\n恥辱、傲慢,轻视他人对自己的态度、轻蔑\n羞耻、羞辱,在含义上,有所不同吗?\n单方的轻视、双方的轻视?\n一方,是,非吾所显明出来的罪;一方,是,吾所显明出来的罪。\n狭隘、愚蠢、固执,轻视他人的信念\n开明、智慧、变通,重视他人的信念' metadata={'source': '_posts/ultimate-facts/终极真实.md'}

llama_print_timings:        load time = 14467.91 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 14467.34 ms /     8 tokens ( 1808.42 ms per token)
llama_print_timings:        eval time = 14635.33 ms /     1 runs   (14635.33 ms per run)
llama_print_timings:       total time = 29115.01 ms

In [ ]:
# from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
In [ ]:
from langchain.prompts import PromptTemplate

CONDENSE_QUESTION_PROMPT = PromptTemplate(
    input_variables=['chat_history', 'question'],
    output_parser=None, partial_variables={},
    template='给定以下对话和后续问题,请重新表述后续问题以成为一个独立问题。\n\n聊天记录:\n{chat_history}\n后续问题:{question}\n独立问题:',
    template_format='f-string',
    validate_template=True
)

QA_PROMPT = PromptTemplate(
    input_variables=['context', 'question'],
    output_parser=None, partial_variables={},
    template='使用下面的背景信息回答最后的问题。如果您不知道答案,请直接说您不知道,不要试图编造一个答案。\n\n背景信息:\n{context}\n\n问题:{question}\n有用的答案:',
    template_format='f-string',
    validate_template=True
)
In [ ]:
from langchain.chains.llm import LLMChain
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores.base import VectorStore
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Callback function to stream answers to stdout.
manager = CallbackManager([StreamingStdOutCallbackHandler()])

streaming_llm = ChatOpenAI(streaming=True, callback_manager=manager, verbose=True, temperature=0)
question_gen_llm = ChatOpenAI(temperature=0, verbose=True, callback_manager=manager)
# Prompt to generate independent questions by incorporating chat history and a new question.
question_generator = LLMChain(llm=question_gen_llm, prompt=CONDENSE_QUESTION_PROMPT)
# Pass in documents and a standalone prompt to answer questions.
doc_chain = load_qa_chain(streaming_llm, chain_type='stuff', prompt=QA_PROMPT)
# Generate prompts from embedding model.
qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)
In [ ]:
import openai
In [ ]:
import json, os
from revChatGPT.V1 import Chatbot, configure

# Open the JSON file and read the conversation_id
with open(os.path.expanduser('~/.config/revChatGPT/config.json'), 'r') as f:
    conversation_id = json.load(f).get('conversation_id', None)

bot = Chatbot(
    config = configure(),
    conversation_id = conversation_id,
    lazy_loading = True
)
In [ ]:
import pytest
In [ ]:
def delta(prompt):
    res = ''
    for response in bot.ask(prompt):
        yield {
            'choices': [
                {
                    'index': 0,
                    'delta': {
                        'content': response['message'][len(res):],
                    }
                }
            ],
        }
        res = response['message']

def mock_create(*args, **kwargs):
    for message in kwargs['messages']:
        if message['role'] == 'user':
            break
    else:
        return {
            'choices': [{}],
        }

    if kwargs.get('stream', False):
        return delta(message['content'])

    for response in bot.ask(message['content']):
        pass
    return {
        'choices': [
            {
                'finish_reason': 'stop',
                'index': 0,
                'message': {
                    'content': response['message'],
                    'role': 'assistant',
                }
            }
        ],
    }
In [ ]:
@pytest.fixture
def mock_openai(monkeypatch):
    monkeypatch.setattr(openai.ChatCompletion, 'create', mock_create)
In [ ]:
question = '终极真实是什么?'
In [ ]:
answer = {}
In [ ]:
def test_qa(mock_openai):
    global answer
    answer = qa({'question': question, 'chat_history': []})
    print('\n')
    assert isinstance(answer, dict)
In [ ]:
from ipymock import do
In [ ]:
do(
    mock_openai=mock_openai,
    test_qa=test_qa,
)
=> no.0  ::source::test_qa  setup  passed

llama_print_timings:        load time = 14467.91 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 22490.19 ms /    15 tokens ( 1499.35 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 22496.21 ms
根据提供的背景信息,以下是问题的答案:

此处引用的文章是哪位心理学家写的? 

这篇文章是由德国心理学家威廉·威廉德撰写的,标题是《心理学的目标和方法》。

=> no.0  ::source::test_qa  runtest  passed

In [ ]:
answer
Out[ ]:
{'question': '终极真实是什么?',
 'chat_history': [],
 'answer': '根据提供的背景信息,以下是问题的答案:\n\n此处引用的文章是哪位心理学家写的? \n\n这篇文章是由德国心理学家威廉·威廉德撰写的,标题是《心理学的目标和方法》。'}

Comments

2023-04-12