Question Answering
Transformers
PyTorch
Arabic
Inference Endpoints
File size: 2,910 Bytes
a807e66
 
 
 
 
 
 
 
 
 
4b871dc
7ca60a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b5655f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: mit
datasets:
- abdoelsayed/Open-ArabicaQA
- abdoelsayed/ArabicaQA
language:
- ar
metrics:
- accuracy
library_name: transformers
pipeline_tag: question-answering
---

# AraDPR: Arabic Dense Passage Retrieval Model

AraDPR is a state-of-the-art dense passage retrieval model specifically designed for the Arabic language. It leverages deep learning techniques to encode passages and questions into dense vectors, facilitating efficient and accurate retrieval for question-answering systems.

## Model Details

### Model Description

- **Developed by:** 
- **Model type:** Dense Passage Retrieval (DPR)
- **Language(s) (NLP):** Arabic
- **License:** MIT
- **Finetuned from:** AraBERT 

### Model Sources

- **Repository:** https://github.com/DataScienceUIBK/ArabicaQA
- **Paper:** will be available soon
- **Demo:** will be available soon

## Uses

### Direct Use

AraDPR is designed for use in Arabic question-answering systems, enabling these systems to retrieve the most relevant passages from a large corpus efficiently.

### Downstream Use

Beyond question answering, AraDPR can be integrated into various NLP applications requiring passage retrieval, such as document summarization, information extraction, and more.

### Out-of-Scope Use

AraDPR is not intended for languages other than Arabic or for tasks that do not involve passage retrieval.

## Bias, Risks, and Limitations

While AraDPR represents a significant advancement in Arabic NLP, users should be aware of the model's limitations, particularly in handling dialects or very domain-specific texts. Further research and development are encouraged to address these challenges.

## How to Get Started with the Model

To get started with AraDPR, you can use the following code snippet:

Please check out our github page: https://github.com/DataScienceUIBK/ArabicaQA
## Training Details
AraDPR was trained on a diverse corpus from Arabic Wikipedia, covering a wide range of topics to ensure comprehensive language representation.

## Results
AraDPR demonstrates superior performance over traditional retrieval methods, significantly improving the efficiency and accuracy of question answering in Arabic.

## Technical Specifications
Model Architecture and Objective
AraDPR utilizes a dual-encoder architecture, with separate encoders for questions and passages. The model is optimized to project semantically related questions and passages closer in the vector space.


## Citation

If you find these codes or data useful, please consider citing our paper as:

```
@misc{abdallah2024arabicaqa,
      title={ArabicaQA: A Comprehensive Dataset for Arabic Question Answering}, 
      author={Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
      year={2024},
      eprint={2403.17848},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```