Abstract. Based on state of the art machine learning techniques, GRO-BID (GeneRation Of BIbliographic Data) performs reliable bibliographic data extractions from scholar articles combined with multi-level term extractions. These two types of extraction present synergies and correspond to complementary descriptions of an article. This tool is viewed as a component for enhancing the existing and the future large repositories of technical and scientific publications. ObjectivesThe purpose of this demonstration is to show to the digital library community a practical example of the accuracy of current state of the art machine learning techniques applied to information extraction in scholarship articles. The demonstration is based on the web application at the following addresse: http://grobid.no-ip.org. Bibliographical Data ExtractionAfter the selection of a PDF document, GROBID extracts the bibliographical data corresponding to the header information (title, authors, abstract, etc.) and to each reference (title, authors, journal title, issue, number, etc.). The references are associated to their respective citation contexts. The result of the citation extraction can be exported as a whole or per reference following different formats (BibTeX and TEI) and as COInS 1 . The automatic extraction of bibliographical data is a challenging task because of the high variability of the bibliographical formats and presentations. We have applied Conditional Random Fields to this task following the approach of [1] implemented with the Mallet toolkit [2], based on approx. 1000 training examples for header information, and 1200 training examples for cited references. An evaluation with the reference CORA dataset showed a reliable level of accuracy of 98,6% per header field and 74.9% per complete header instance, 95,7% per citation field and 78.9% per citation instance.
The motion of a thin viscous layer of fluid on a horizontal solid surface bounded laterally by a dry spot and a vertical solid wall is considered. A lubrication model with contact line motion is studied. We find that for a container of fixed length the axisymmetric equilibrium solutions with small dry spots are unstable to axisymmetric disturbances. As the size of the dry spot increases, the equilibrium solutions become unstable to nonaxisymmetric disturbances. In addition, we present numerical solutions of the nonlinear evolution equations in the axisymmetric and nonaxisymmetric cases for different values of the parameters. The axisymmetric results show good agreement with existing experimental results.
A thin layer of liquid advancing over a dry, heated, inclined plate is studied. A lubrication model with contact line motion is derived. The plate is at constant temperature, and the surface Biot number is specified. The steady-state solution is obtained numerically. In addition, the steady-state solution is studied analytically in the neighbourhood of the contact line. A linear stability analysis about the steady state is then performed. The effects of gravity, thermocapillarity and contact line motion are discussed. In particular, we determine a band of unstable wavenumbers, and the maximum growth rate as a function of these parameters.
Software contributions to academic research are relatively invisible, especially to the formalized scholarly reputation system based on bibliometrics. In this article, we introduce a gold‐standard dataset of software mentions from the manual annotation of 4,971 academic PDFs in biomedicine and economics. The dataset is intended to be used for automatic extraction of software mentions from PDF format research publications by supervised learning at scale. We provide a description of the dataset and an extended discussion of its creation process, including improved text conversion of academic PDFs. Finally, we reflect on our challenges and lessons learned during the dataset creation, in hope of encouraging more discussion about creating datasets for machine learning use.
Citation indices are tools used by the academic community for research and research evaluation which aggregate scientific literature output and measure impact by collating citation counts. Citation indices help measure the interconnections between scientific papers but fall short because they fail to communicate contextual information about a citation. The usage of citations in research evaluation without consideration of context can be problematic, because a citation that presents contrasting evidence to a paper is treated the same as a citation that presents supporting evidence. To solve this problem, we have used machine learning, traditional document ingestion methods, and a network of researchers to develop a “smart citation index” called scite, which categorizes citations based on context. Scite shows how a citation was used by displaying the surrounding textual context from the citing paper and a classification from our deep learning model that indicates whether the statement provides supporting or contrasting evidence for a referenced work, or simply mentions it. Scite has been developed by analyzing over 25 million full-text scientific articles and currently has a database of more than 880 million classified citation statements. Here we describe how scite works and how it can be used to further research and research evaluation. Peer Review https://publons.com/publon/10.1162/qss_a_00146
It has been observed that when a thin liquid film coats an initially dry inclined plane, a spanwise instability occurs at the leading edge. Here we develop a model for the evolution of this coating film which includes inertia, gravity, surface tension and the contact angle at the leading edge of the film. A Kármán–Pohlhausen method is used to include inertia. We determine steady state profiles of the film and investigate their stability. The predictions of the model are compared to some recent experiments and we find good agreement. This theory gives improvement over a lubrication theory in experiments where Reynolds numbers are significantly larger than one.
Experimental results are presented for the motion of a dry spot in a thin viscous film on a horizontal surface. These include global and spatial measurements of dry spot diameter, front velocities, static and dynamic contact angle, and the shape of the liquid–solid interface. Data are presented as a function of initial fluid depth for both an advancing fluid front of a collapsing dry spot and a receding fluid front of an opening dry spot. Results for both cases show that the final or static hole diameter increases as the initial fluid depth decreases. Also, insight is obtained into the relationship between the contact angle and the velocity for both advancing and receding fluid fronts. The experimental results are compared to a lubrication model, and good agreement is obtained.
Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.