Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic-and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.
Purpose The purpose of this paper is to analyze the scientific collaboration of institutions and its impact on institutional research performance in terms of productivity and quality. The researchers examined the local and international collaborations that have a great impact on institutional performance. Design/methodology/approach Collaboration dependence measure was used to investigate the impact of an institution on external information. Based on this information, the authors used “index of gain in impact through collaboration” to find the impact of collaborated publications in institutional research performance. Bibliographic data between 1996 and 2010 retrieved from Scopus were used to conduct current study. The authors carried out the case study of top institutes of Pakistan in terms of publication count to elaborate the difference between high performing institutions and those who gain disproportionally in terms of perceived quality of their output because of local or international collaboration. Findings The results showed that the collaboration of developing countries institutes on international level had a great impact on institutional performance and they gain more benefit than local collaboration. Altogether, the scientific collaboration has a positive impact on institutional performance as measured by the cumulative source normalized impact per paper of their publications. The findings could also help researchers to find out appropriate collaboration partners. Originality/value This study has revealed some salient characteristics of collaboration in academic research. It becomes apparent that collaboration intensity is not uniform, but in general, the average quality of scientific production is the variable that most often correlates positively with the collaboration intensity of universities.
A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics.
The purpose of the study is to (a) contribute to annotating an Altmetrics dataset across five disciplines, (b) undertake sentiment analysis using various machine learning and natural language processing–based algorithms, (c) identify the best-performing model and (d) provide a Python library for sentiment analysis of an Altmetrics dataset. First, the researchers gave a set of guidelines to two human annotators familiar with the task of related tweet annotation of scientific literature. They duly labelled the sentiments, achieving an inter-annotator agreement (IAA) of 0.80 (Cohen’s Kappa). Then, the same experiments were run on two versions of the dataset: one with tweets in English and the other with tweets in 23 languages, including English. Using 6388 tweets about 300 papers indexed in Web of Science, the effectiveness of employed machine learning and natural language processing models was measured by comparing with well-known sentiment analysis models, that is, SentiStrength and Sentiment140, as the baseline. It was proved that Support Vector Machine with uni-gram outperformed all the other classifiers and baseline methods employed, with an accuracy of over 85%, followed by Logistic Regression at 83% accuracy and Naïve Bayes at 80%. The precision, recall and F1 scores for Support Vector Machine, Logistic Regression and Naïve Bayes were (0.89, 0.86, 0.86), (0.86, 0.83, 0.80) and (0.85, 0.81, 0.76), respectively.
We argue that classic citation-based scientific document clustering approaches, like co-citation or bibliographic coupling, lack to leverage the social-usage of the scientific literature originate through online information dissemination platforms, such as Twitter. In this paper, we present the methodology tweet coupling, which measures the similarity between two or more scientific documents if one or more Twitter users mention them in the tweet(s). We evaluate our proposal on an altmetric dataset, which consists of 3,081 scientific documents and 8,299 unique Twitter users. By employing the clustering approaches of bibliographic coupling and tweet coupling, we find the relationship between the bibliographic and tweet coupled scientific documents. Further, using VOSviewer, we empirically show that tweet coupling appears to be a better clustering methodology to generate cohesive clusters since it groups similar documents from the subfields of the selected field, in contrast to the bibliographic coupling approach that groups cross-disciplinary documents in the same cluster.
Potential benefits of learning analytics (LA) for improving students’ performance, predicting students’ success, and enhancing teaching and learning practice have increasingly been recognized in higher education. However, the adoption of LA in higher education institutions (HEIs) to date remains sporadic and predominantly small in scale due to several socio-technical challenges. To better understand why HEIs struggle to scale LA adoption, it is needed to untangle adoption challenges and their related factors. This paper presents the findings of a study that sought to investigate the associations of adoption factors with challenges HEIs face in the adoption of LA and how these associations are compared among HEIs at different scopes of adoption. The study was based on a series of semi-structured interviews with senior managers in HEIs. The interview data were thematically analysed to identify the main challenges in LA adoption. The connections between challenges and other factors related to LA adoption were analysed using epistemic network analysis (ENA). From senior managers’ viewpoints, ethical issues of informed consent and resistance culture had the strongest links with challenges of learning analytic adoption in HEI; this was especially true for those institutions that had not adopted LA or who were in the initial phase of adoption (i.e., preparing for or partially implementing LA). By contrast, among HEIs that had fully adopted LA, the main challenges were found to be associated with centralized leadership, gaps in the analytic capabilities, external stakeholders, and evaluations of technology. Based on the results, we discuss implications for LA strategy that can be useful for institutions at various stages of LA adoption, from early stages of interest to the full adoption phase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.