We propose a new comprehensive literature mining platform called miDatabase™ in response to the need to identify potential miRNA biomarkers through existing literatures. miDatabase™ is composed of four toolkits, named entity recognition (NER) toolkit, model toolkit, query toolkit and real-time curation helper, to build a miRNA relevant knowledge database from life science literatures studying various human cancers and diseases.
To date, miDatabase™ has processed more than 29,000 miRNA related full-text literatures. More than 6.6 million sentences and 8 million entities, categorized in three concepts of 18 groups, were extracted from these literatures and collected on miDatabase™. Due to the vast amount of acquired dataset, a stringent cutoff is used to remove false positives, and approximately 0.4 million informative relationships have been identified with a very high precision of ~0.95. Furthermore, with an evidence and cognition mixed query algorithm, the top 50 queried results are validated to have a precision of 0.84.







Contact Us