beir/hotpotqa

1. 사용법

1.1. 모든 데이터 순회

1.2. 개별 데이터 접근

2. 속성

2.1. doc

2.2. query

2.3. qrel

3. 통계

4. 인용

5. 출처

6. 라이센스

1. 사용법

1.1. 모든 데이터 순회

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/hotpotqa')

for doc in loader.get_docs():
    print(doc.id, doc.text, doc.title, doc.url)
    break

for query in loader.get_queries():
    print(query.id, query.text)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break

1.2. 개별 데이터 접근

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/fiqa')

doc = loader.get_doc('[did]')
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

2. 속성

2.1. doc

속성	자료형
id	str
text	str
title	str
url	str

2.2. query

속성	자료형
id	str
text	str

2.3. qrel

속성	자료형
qid	str
did	str
score	int

[mode]: test, dev, train

3. 통계

수치		값
Task		Question Answering
Domain		Wikipedia
# Query		97,852
# Doc		5,233,235
# Qrel	test	14,810
	dev	10894
	train	170,000
Average Rel D/Q	test	0.15
	dev	0.11
	train	1.74
Average Query Length (words)		17.66
Average Doc Length (words)		43.72

4. 인용

@inproceedings{Yang2018Hotpotqa,
    title = "{H}otpot{QA}: A Dataset for Diverse, Explainable Multi-hop Question Answering",
    author = "Yang, Zhilin  and
      Qi, Peng  and
      Zhang, Saizheng  and
      Bengio, Yoshua  and
      Cohen, William  and
      Salakhutdinov, Ruslan  and
      Manning, Christopher D.",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D18-1259",
    doi = "10.18653/v1/D18-1259",
    pages = "2369--2380"
}
@article{Thakur2021Beir,
  title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
  author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", 
  journal= "arXiv preprint arXiv:2104.08663",
  month = "4",
  year = "2021",
  url = "https://arxiv.org/abs/2104.08663",
}

5. 출처

6. 라이센스

CC BY-SA 4.0