beir/trec-covid

목차

1. 사용법

1.1. 모든 데이터 순회

1.2. 개별 데이터 접근

2. 속성

2.1. doc

2.2. query

2.3. qrel

3. 통계

4. 인용

5. 출처

6. 라이센스



1. 사용법

1.1. 모든 데이터 순회

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/trec-covid')

for doc in loader.get_docs():
    print(doc.id, doc.text, doc.title, doc.url, doc.pubmed_id)
    break

for query in loader.get_queries():
    print(query.id, query.text, query.narrative)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break

1.2. 개별 데이터 접근

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/trec-covid')

doc = loader.get_doc('[did]')
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

2. 속성

2.1. doc

속성자료형
idstr
textstr
titlestr
urlstr
pubmed_idstr

2.2. query

속성자료형
idstr
textstr
narrativestr

2.3. qrel

속성자료형
qidstr
didstr
scoreint
  • [mode]: test

3. 통계

수치
TaskBio-Medical Information Retrieval
DomainBio-Medical
# Query50
# Doc129,192
# Qreltest21,538
Average Rel D/Qtest430.76
Average Query Length (words)10.60
Average Doc Length (words)197.13

4. 인용

@article{Wang2020Cord19,
  title = "CORD-19: The Covid-19 Open Research Dataset",
  author = "Lucy Lu Wang and Kyle Lo and Yoganand Chandrasekhar and Russell Reas and Jiangjiang Yang and Darrin Eide and K. Funk and Rodney Michael Kinney and Ziyang Liu and W. Merrill and P. Mooney and D. Murdick and Devvret Rishi and Jerry Sheehan and Zhihong Shen and B. Stilson and A. Wade and K. Wang and Christopher Wilhelm and Boya Xie and D. Raymond and Daniel S. Weld and Oren Etzioni and Sebastian Kohlmeier",
  journal = "ArXiv",
  year = "2020"
}
@article{Voorhees2020TrecCovid,
  title = "TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection",
  author = "E. Voorhees and Tasmeer Alam and Steven Bedrick and Dina Demner-Fushman and W. Hersh and Kyle Lo and Kirk Roberts and I. Soboroff and Lucy Lu Wang",
  journal = "ArXiv",
  year = "2020",
  volume = "abs/2005.04474"
}
@article{Thakur2021Beir,
  title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
  author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", 
  journal = "arXiv preprint arXiv:2104.08663",
  month = "4",
  year = "2021",
  url = "https://arxiv.org/abs/2104.08663",
}

5. 출처


6. 라이센스