beir/nq

목차

1. 사용법

1.1. 모든 데이터 순회

1.2. 개별 데이터 접근

2. 속성

2.1. doc

2.2. query

2.3. qrel

3. 통계

4. 인용

5. 출처

6. 라이센스



1. 사용법

1.1. 모든 데이터 순회

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/nq')

for doc in loader.get_docs():
    print(doc.id, doc.text, doc.title)
    break

for query in loader.get_queries():
    print(query.id, query.text)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break

1.2. 개별 데이터 접근

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/nq')

doc = loader.get_doc('[did]')
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

2. 속성

2.1. doc

속성자료형
idstr
textstr
titlestr

2.2. query

속성자료형
idstr
textstr

2.3. qrel

속성자료형
qidstr
didstr
scoreint
  • [mode]: test

3. 통계

수치
TaskQuestion Answering
DomainWikipedia
# Query3,452
# Doc2,680,961
# Qreltest4,201
Average Rel D/Qtest1.22
Average Query Length (words)9.16
Average Doc Length (words)76.04

4. 인용

@article{Kwiatkowski2019Nq,
  title = {Natural Questions: a Benchmark for Question Answering Research},
  author = {Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov},
  year = {2019},
  journal = {TACL}
}
@article{Thakur2021Beir,
  title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
  author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", 
  journal= "arXiv preprint arXiv:2104.08663",
  month = "4",
  year = "2021",
  url = "https://arxiv.org/abs/2104.08663",
}

5. 출처


6. 라이센스