huggingchat/chat-ui · [NEW] WebSearch 2.0

Hugging Chat org Sep 13, 2023

•

March 15th update: 🌐Internet Access for Assistants

Hi HuggingChat community!

We've just released a big update to the WebSearch feature, it now uses Retrieval-augmented generation (RAG) to extract relevant information from multiple web pages! From our tests it's much more powerful than before 🚀.

We would love to get your feedback on it! Also, if you want to check the details or even contribute, take a look at the PR on Github.

See you soon!

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [DRAFT][NEW] - Updated WebSearch - feedback welcome! Sep 13, 2023

victor changed discussion title from [DRAFT][NEW] - Updated WebSearch - feedback welcome! to [NEW] - Updated WebSearch - feedback welcome! Sep 13, 2023

victor pinned discussion Sep 13, 2023

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [NEW] WebSearch 2.0 - feedback welcome! Sep 13, 2023

BramVanroy

Sep 13, 2023

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

nsarrazin

Hugging Chat org Sep 13, 2023

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

You can check out the feature here!

MoritzLaurer

Sep 13, 2023

@BramVanroy , they use the SerpAPI which, as far as I understand, is paid and legal. see the source code here: https://github.com/huggingface/chat-ui/blob/main/src/lib/server/websearch/searchWeb.ts

MoritzLaurer

Sep 13, 2023

•

edited Sep 13, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context (regardless of whether the LLM actually used/refers to the source)? In Bing chat, they somehow managed to attribute sources to specific parts/sentences of the generated output and not only the generated output as a whole. I've always been wondering how they made that work (direct citations / attributing sources to specific sub-parts of a generated output). does anyone know?

mishig

Hugging Chat org Sep 14, 2023

•

edited Sep 14, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context

Yes

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

DarwinAnim8or

Sep 14, 2023

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
Awesome work though! Really cool :D

MoritzLaurer

Sep 14, 2023

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

@mishig , yeah true. My first intuition was that I would be hesitant to trust embeddings from a bi-encoder to be reliable enough for this. But if you took a cross encoder, that should actually work quite well. especially if you set a high enough threshold to avoid false positives (then it could even work with a bi-encoder sentence transformer). could be a nice feature to have direct citations :)

126 hidden messages

Expand all

acharyaaditya26

Jun 5

@victor @nsarrazin search is not working it says "out of credits", can you please help us out

victor

Hugging Chat org Jun 5

It's back up sorry about it @acharyaaditya26

pluhong

Jul 10

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)

@DarwinAnim8or there has been updates since the last comment here. You can see this PR on how you.com is being added as a search engine. And do same for DuckDuckGo or Bing :)

Is there a way to do this in https://huggingface.co/chat/ ? i would prefer to use searxng or DuckDuckGo as the search engine instead of google.

Fikalis

Aug 22

Recently web search start activating even then there is default option on. Is that a bug?

victor

Hugging Chat org Aug 26

Recently web search start activating even then there is default option on. Is that a bug?

Maybe you are using Tools models where the model choose itself to search the web or not?

Fikalis

Aug 29

Recently web search start activating even then there is default option on. Is that a bug?

Maybe you are using Tools models where the model choose itself to search the web or not?

Im using assistants (models either cohere or llama) with default option on in assistants settings (Assistant will not use internet to do information retrieval and will respond faster. Recommended for most Assistants.) but sometimes bot start responding slower and it uses web search. Does tools have to do something with that?

nsarrazin unpinned discussion Sep 10

jdoexbox10

Oct 5

But how do they send a websearch request as closedai proomts chatgpt to use a websearch request. So is there a specific isntruction the ai uses to search the web, or even call tools for image generation etc. eg (defualt and 3rd party)?

acharyaaditya26

about 6 hours ago

I am getting this error during web search