27,806 compounds extracted from the English Wikipedia
27,806
Compounds
68.4%
Valid formulas
61.0%
Has SMILES
48.2%
Organic
PageRank — Co-citation vs Induced
Compounds by Category
MW Distribution by Category
Title
Formula
MW
Category
Backlinks ⓘ
PR Rank Induced ⓘ
PR Rank Co-cite ⓘ
CAS
Page Length ⓘ
SMILES
Column definitions & methodology
Backlinks
Total number of English Wikipedia articles that contain a hyperlink
to this compound's page. Extracted from the Wikipedia SQL link-target
dump and counted across the full article namespace.
PR Rank (Induced)
Rank (1 = highest) by PageRank computed on the
induced sub-graph: only hyperlinks between pages that are
themselves in this chemistry dataset are included. This measures how
central a compound is within the chemistry literature on Wikipedia —
compounds that many other chemistry articles link to rank highest.
PR Rank (Co-cite)
Rank (1 = highest) by PageRank computed on the
co-citation graph: two compound pages are connected
(with edge weight equal to co-occurrence count) whenever they both
appear as link targets in the same source article. PageRank on this
graph identifies compounds that are conceptually grouped together
most often across Wikipedia, independent of direct links between them.
Category
Organic — SMILES parsed by Open Babel contains
at least one C–H bond.
Inorganic — SMILES parses but no C–H bond found.
Elemental — single-element formula flag set.
Mineral — Infobox mineral template or
title ends in -ite.
Protein — Infobox protein family or
title ends in -ase.
Undetermined — none of the above.
MW
Molecular weight in g/mol, derived from the parsed formula.
Blank when the formula could not be parsed.