Identifying Reliable Sources of Information about Companies in Multilingual Wikipedia

DOI: http://dx.doi.org/10.15439/2022F259

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 705714

Abstract. Wikipedia over 20 years is edited by volunteers from all over the world. Such editors have different education, culture and competences. One of the core rules of Wikipedia says, that information in its articles should be based on reliable sources and Wikipedia readers must be able to verify particular facts in text. However, reliability is a subjective concept and a reputation of the same source can be assessed diffidently depending on a person (or group of persons), language and topic. So each language version of Wikipedia may have own rules or criteria on how the website must be assessed before it can be used as a source in references. At the same time, nowadays there are over 1 billion websites on the Internet and only few developed Wikipedia language versions contain non-exhaustive lists of popular websites with reliability assessment. Additionally, since reputation of the source can be changed during the time, such lists must be updated regularly.This study presents result of identification of reliable sources of information based on analysis of over 200 million references that were extracted from over 40 million Wikipedia articles. Using DBpedia and Wikidata we identified articles related to different kinds of companies and find the most important sources of information in particular area. This allows to find differences in sources reliability between Wikipedia languages and find important sources that provide information about various companies on Wikipedia.


