How would it check unordered lists

#1
by MarkWard0110 - opened

I have a situation where the check should verify that the items are in the correct unordered list. The model checks against the order and doesn't allow correct answers that are arranged differently. How would I provide instructions for the list to be unordered?

For example, the following would be a correct answer because the list contains all the items arranged differently.

Document:
odd:[A0,A1,A2,B1,B2,C0,C1]
even:[B0,C2]

Claim:
odd:[C0,C1,B1,B2,A0,A1,A2]
even:[B0,C2]
assistant:
No

When the lists are in the exact order, it returns "Yes"

Document:
odd:[A0,A1,A2,B1,B2,C0,C1]
even:[B0,C2]

Claim:
odd:[A0,A1,A2,B1,B2,C0,C1]
even:[B0,C2]
assistant:
Yes

Change the document's order and claim to match.

Document:
odd:[C0,C1,B1,B2,A0,A1,A2]
even:[B0,C2]

Claim:
odd:[C0,C1,B1,B2,A0,A1,A2]
even:[B0,C2]
assistant:
Yes

Change the claim's order to be different from the document.

Document:
odd:[C0,C1,B1,B2,A0,A1,A2]
even:[B0,C2]

Claim:
odd:[C0,C1,A0,A1,A2,B1,B2]
even:[B0,C2]
assistant:
No

I think I discovered that the Claim: can contain instructions for the LLM.

For example, if I add the instruction to the Claim that the list is unordered I am able to get "Yes" responses when the claim order is different from the document.

Document:
odd:[C0,C1,B1,B2,A0,A1,A2]
even:[B0,C2]

Claim:
Odd and even are unordered sets.
odd:[A2,B1,C1,B2,C0,A0,A1]
even:[C2,B0]
assistant: 
Yes
Bespoke Labs org

Hi, this is an interesting use case of our model! Our model is generally developed for fact-checking claims against documents, where the claims are usually in natural language. In the example you provided, it is unclear what to check in the claim, without a clear description of the task. Therefore, designing prompts might be necessary here. On the other hand, if each item in the list is a sentence, then the model would work better when you (1) feed each item separately; (2) do (document, some item) check for each item; (3) and aggregate the final result.

Let me know if this answers your question!

Sign up or log in to comment