Identifying Reliable Sources of Information about Companies in Multilingual Wikipedia
Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz
Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 705–714 (2022)
Abstract. Wikipedia over 20 years is edited by volunteers from all over the world. Such editors have different education, culture and competences. One of the core rules of Wikipedia says, that information in its articles should be based on reliable sources and Wikipedia readers must be able to verify particular facts in text. However, reliability is a subjective concept and a reputation of the same source can be assessed diffidently depending on a person (or group of persons), language and topic. So each language version of Wikipedia may have own rules or criteria on how the website must be assessed before it can be used as a source in references. At the same time, nowadays there are over 1 billion websites on the Internet and only few developed Wikipedia language versions contain non-exhaustive lists of popular websites with reliability assessment. Additionally, since reputation of the source can be changed during the time, such lists must be updated regularly.This study presents result of identification of reliable sources of information based on analysis of over 200 million references that were extracted from over 40 million Wikipedia articles. Using DBpedia and Wikidata we identified articles related to different kinds of companies and find the most important sources of information in particular area. This allows to find differences in sources reliability between Wikipedia languages and find important sources that provide information about various companies on Wikipedia.
- English Wikipedia, “Wikipedia:Reliable sources,” https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources, 2022.
- ——, “Wikipedia:Verifiability,” https://en.wikipedia.org/wiki/Wikipedia:Verifiability, 2022.
- ——, “Wikipedia:Reliable sources/Perennial sources,” https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources, 2022.
- Internet Live Stats, “Total number of Websites,” https://www.internetlivestats.com/total-number-of-websites/, 2022.
- Netcraft, “August 2021 Web Server Survey,” https://news.netcraft.com/archives/2021/08/25/august-2021-web-server-survey.html, 2021.
- B. Stvilia, M. B. Twidale, L. C. Smith, and L. Gasser, “Assessing information quality of a community-based encyclopedia,” Proc. ICIQ, pp. 442–454, 2005.
- J. E. Blumenstock, “Size matters: word count as a measure of quality on Wikipedia,” in Proceedings of the 17th international conference on World Wide Web. ACM, 2008, pp. 1095–1096.
- W. Lewoniewski, “The method of comparing and enriching information in multlingual wikis based on the analysis of their quality,” PhD, Poznań University of Economics and Business, 2018. [Online]. Available: http://www.wbc.poznan.pl/Content/461699/Lewoniewski_Wlodzimierz-rozprawa_doktorska.pdf
- WikiRank, “Quality and Popularity Assessment of Wikipedia Articles,” https://wikirank.net/, 2022.
- P. Tzekou, S. Stamou, N. Kirtsis, and N. Zotos, “Quality Assessment of Wikipedia External Links,” in WEBIST, 2011, pp. 248–254.
- E. Yaari, S. Baruchson-Arbib, and J. Bar-Ilan, “Information quality assessment of community generated content: A user study of Wikipedia,” Journal of Information Science, vol. 37, no. 5, pp. 487–498, 2011.
- R. Conti, E. Marzini, A. Spognardi, I. Matteucci, P. Mori, and M. Petrocchi, “Maturity assessment of Wikipedia medical articles,” in ComputerBased Medical Systems (CBMS), 2014 IEEE 27th International Symposium on. IEEE, 2014, pp. 281–286.
- W. Lewoniewski, K. Węcel, and W. Abramowicz, “Analysis of references across Wikipedia languages,” in International Conference on Information and Software Technologies. Springer, 2017, pp. 561–573.
- F. Å. Nielsen, D. Mietchen, and E. Willighagen, “Scholia, scientometrics and Wikidata,” in European Semantic Web Conference. Springer, 2017, pp. 237–259.
- W. Lewoniewski, K. Węcel, and W. Abramowicz, “Modeling Popularity and Reliability of Sources in Multilingual Wikipedia,” Information, vol. 11, no. 5, p. 263, 2020.
- H. Singh, R. West, and G. Colavizza, “Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia,” Quantitative Science Studies, vol. 2, no. 1, pp. 1–19, 2021.
- M. Teplitskiy, G. Lu, and E. Duede, “Amplifying the impact of open access: Wikipedia and the diffusion of science,” Journal of the Association for Information Science and Technology, vol. 68, no. 9, pp. 2116–2127, 2017.
- D. Jemielniak, G. Masukume, and M. Wilamowski, “The most influential medical journals according to Wikipedia: quantitative analysis,” Journal of medical Internet research, vol. 21, no. 1, p. e11429, 2019.
- G. Colavizza, “COVID-19 research in Wikipedia,” Quantitative Science Studies, vol. 1, no. 4, pp. 1349–1380, 12 2020. [Online]. Available: https://doi.org/10.1162/qss_a_00080
- B. Fetahu, K. Markert, W. Nejdl, and A. Anand, “Finding news citations for wikipedia,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, pp. 337–346.
- T. Piccardi, M. Redi, G. Colavizza, and R. West, “Quantifying engagement with citations on Wikipedia,” in Proceedings of The Web Conference 2020, 2020, pp. 2365–2376.
- BestRef, “Popularity and Reliability Assessment of Wikipedia Sources,” https://bestref.net, 2022.
- Wikimedia Downloads, “Main page,” https://dumps.wikimedia.org, 2021.
- W. Lewoniewski, K. Węcel, and W. Abramowicz, “Reliability in Time: Evaluating the Web Sources of Information on COVID-19 in Wikipedia across Various Language Editions from the Beginning of the Pandemic,” 2022, presented at Wiki WorkShop 2022 (held virtually at The Web Conference 2022) on April 25, 2022.
- Public Suffix List, “List,” https://publicsuffix.org/learn/, 2021.
- F. Å. Nielsen, “Scientific citations in Wikipedia,” arXiv preprint https://arxiv.org/abs/0705.2106, 2007.
- M. Redi, “Characterizing Wikipedia Citation Usage. Analyzing Reading Sessions,” https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Citation_Usage/Analyzing_Reading_Sessions, 2019, [Online; accessed 01-Sep-2021].
- J. Lerner and A. Lomi, “Knowledge categorization affects popularity and quality of Wikipedia articles,” PloS one, vol. 13, no. 1, p. e0190674, 2018.
- W. Lewoniewski, K. Węcel, and W. Abramowicz, “Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics,” Computers, vol. 8, no. 3, 2019. [Online]. Available: https://www.mdpi.com/2073-431X/8/3/60
- A. Lih, “Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource,” 5th International Symposium on Online Journalism, p. 31, 2004.
- D. M. Wilkinson and B. a. Huberman, “Cooperation and quality in wikipedia,” Proceedings of the 2007 international symposium on Wikis WikiSym 07, pp. 157–164, 2007.
- G. C. Kane, “A multimethod study of information quality in wiki collaboration,” ACM Transactions on Management Information Systems (TMIS), vol. 2, no. 1, p. 4, 2011.
- J. Liu and S. Ram, “Using big data and network analysis to understand Wikipedia article quality,” Data & Knowledge Engineering, 2018.
- English Wikipedia, “Category:All Wikipedia bots,” https://en.wikipedia.org/wiki/Category:All_Wikipedia_bots, 2022.
- Databus, “DBpedia Ontology instance types,” https://databus.dbpedia. org/dbpedia/mappings/instance-types/, 2022.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The semantic web. Springer, 2007, pp. 722–735.
- J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer et al., “Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia,” Semantic web, vol. 6, no. 2, pp. 167–195, 2015.
- DBpedia, “Ontology Classes,” http://mappings.dbpedia.org/server/ontology/classes/, 2022.
- data.lewoniewski.info, “Supplementary materials for this research,” https://data.lewoniewski.info/companies/, 2022.
- Wikidata, “Main Page,” https://www.wikidata.org/wiki/Wikidata:Main_Page, 2022.
- D. Vrandečić and M. Krötzsch, “Wikidata: a free collaborative knowledgebase,” Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014.
- Wikidata, “Q380,” https://www.wikidata.org/wiki/Q380, 2022.
- Wikidata Query Sevice, “Main page,” https://query.wikidata.org/, 2022.
- Wikimedia Downloads, “Wikidata Wiki Entities,” https://dumps.wikimedia.org/wikidatawiki/entities/, 2022.