beir/fever

목차

1. 사용법

1.1. 모든 데이터 순회

1.2. 개별 데이터 접근

2. 속성

2.1. doc

2.2. query

2.3. qrel

3. 통계

4. 인용

5. 출처

6. 라이센스



1. 사용법

1.1. 모든 데이터 순회

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/fever')

for doc in loader.get_docs():
    print(doc.id, doc.text, doc.title)
    break

for query in loader.get_queries():
    print(query.id, query.text)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break

1.2. 개별 데이터 접근

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/fever')

doc = loader.get_doc('[did]')
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

2. 속성

2.1. doc

속성자료형
idstr
textstr
titlestr

2.2. query

속성자료형
idstr
textstr

2.3. qrel

속성자료형
qidstr
didstr
scoreint
  • [mode]: test, dev, train

3. 통계

수치
TaskFact Checking
DomainWikipedia
# Query123,142
# Doc5,396,138
# Qreltest7,937
dev8,079
train140,082
Average Rel D/Qtest0.06
dev0.07
train1.14
Average Query Length (words)8.13
Average Doc Length (words)95.21

4. 인용

@inproceedings{Thorne2018Fever,
    title = "{FEVER}: a Large-scale Dataset for Fact Extraction and {VER}ification",
    author = "Thorne, James  and
      Vlachos, Andreas  and
      Christodoulopoulos, Christos  and
      Mittal, Arpit",
    booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)",
    month = jun,
    year = "2018",
    address = "New Orleans, Louisiana",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N18-1074",
    doi = "10.18653/v1/N18-1074",
    pages = "809--819"
}
@article{Thakur2021Beir,
  title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
  author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", 
  journal= "arXiv preprint arXiv:2104.08663",
  month = "4",
  year = "2021",
  url = "https://arxiv.org/abs/2104.08663",
}

5. 출처


6. 라이센스