Spaces:
Running
[NEW] WebSearch 2.0 - feedback welcome!
March 15th update: πInternet Access for Assistants
Hi HuggingChat community!
We've just released a big update to the WebSearch feature, it now uses Retrieval-augmented generation (RAG) to extract relevant information from multiple web pages! From our tests it's much more powerful than before π.
We would love to get your feedback on it! Also, if you want to check the details or even contribute, take a look at the PR on Github.
See you soon!
I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?
I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?
You can check out the feature here!
@BramVanroy , they use the SerpAPI which, as far as I understand, is paid and legal. see the source code here: https://github.com/huggingface/chat-ui/blob/main/src/lib/server/websearch/searchWeb.ts
Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context (regardless of whether the LLM actually used/refers to the source)? In Bing chat, they somehow managed to attribute sources to specific parts/sentences of the generated output and not only the generated output as a whole. I've always been wondering how they made that work (direct citations / attributing sources to specific sub-parts of a generated output). does anyone know?
Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context
Yes
In Bing chat, they somehow managed to attribute sources to specific parts/sentences.
Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing
Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
Awesome work though! Really cool :D
In Bing chat, they somehow managed to attribute sources to specific parts/sentences.
Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing
@mishig , yeah true. My first intuition was that I would be hesitant to trust embeddings from a bi-encoder to be reliable enough for this. But if you took a cross encoder, that should actually work quite well. especially if you set a high enough threshold to avoid false positives (then it could even work with a bi-encoder sentence transformer). could be a nice feature to have direct citations :)
@victor @nsarrazin search is not working it says "out of credits", can you please help us out
It's back up sorry about it @acharyaaditya26
Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
@DarwinAnim8or there has been updates since the last comment here. You can see this PR on how you.com is being added as a search engine. And do same for DuckDuckGo or Bing :)
Is there a way to do this in https://huggingface.co/chat/ ? i would prefer to use searxng or DuckDuckGo as the search engine instead of google.
Recently web search start activating even then there is default option on. Is that a bug?
Recently web search start activating even then there is default option on. Is that a bug?
Maybe you are using Tools models where the model choose itself to search the web or not?
Recently web search start activating even then there is default option on. Is that a bug?
Maybe you are using Tools models where the model choose itself to search the web or not?
Im using assistants (models either cohere or llama) with default option on in assistants settings (Assistant will not use internet to do information retrieval and will respond faster. Recommended for most Assistants.) but sometimes bot start responding slower and it uses web search. Does tools have to do something with that?
But how do they send a websearch request as closedai proomts chatgpt to use a websearch request. So is there a specific isntruction the ai uses to search the web, or even call tools for image generation etc. eg (defualt and 3rd party)?