beir/signal1m

목차

1. 사용법

1.1. 모든 데이터 순회

1.2. 개별 데이터 접근

2. 속성

2.1. doc

2.2. query

2.3. qrel

3. 통계

4. 인용

5. 출처

6. 라이센스



1. 사용법

1.1. 모든 데이터 순회

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/signal1m')

for doc in loader.get_docs():
    print(doc.id, doc.text)
    break

for query in loader.get_queries():
    print(query.id, query.text)
    break

for qrel in loader.get_qrels('[mode]'):
    print(qrel.qid, qrel.did, qrel.score)
    break

1.2. 개별 데이터 접근

from hamu_tool.dataset import DataLoader

loader = DataLoader.load('beir/signal1m')

doc = loader.get_doc('[did]')
print(doc)

query = loader.get_query('[qid]')
print(query)

qrel = loader.get_qrel('[mode]', '[qid]')
print(qrel)

2. 속성

2.1. doc

속성자료형
idstr
textstr

2.2. query

속성자료형
idstr
textstr

2.3. qrel

속성자료형
qidstr
didstr
scoreint
  • [mode]: test

3. 통계

수치
TaskTweet Retrieval
DomainTwitter
# Query97
# Doc2,866,315
# Qreltest1,899
Average Rel D/Qtest19.58
Average Query Length (words)9.40
Average Doc Length (words)14.06

4. 인용

@inproceedings{Signal1M2016,
  author = {David Corney and Dyaa Albakour and Miguel Martinez and Samir Moussa},
  title = {What do a Million News Articles Look like?},
  booktitle = {Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval {(ECIR} 2016), Padua, Italy, March 20, 2016.},
  pages = {42--47},
  year = {2016},
  url = {http://ceur-ws.org/Vol-1568/paper8.pdf}
}
@article{Thakur2021Beir,
  title = "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
  author = "Thakur, Nandan and Reimers, Nils and Rücklé, Andreas and Srivastava, Abhishek and Gurevych, Iryna", 
  journal= "arXiv preprint arXiv:2104.08663",
  month = "4",
  year = "2021",
  url = "https://arxiv.org/abs/2104.08663",
}

5. 출처


6. 라이센스

  • Unknown