English for Knowledge Graph Engineers: Graph Databases and Semantic Web Vocabulary

Learn the English vocabulary knowledge graph engineers use when discussing graph databases, RDF, ontologies, SPARQL, Cypher, and the semantic web.

Knowledge graphs power some of the most sophisticated systems in tech — Google’s Knowledge Graph, Amazon’s product graph, LinkedIn’s economic graph. Engineers who work in this domain speak a precise hybrid vocabulary drawn from database theory, formal logic, and web standards. Without it, discussions about ontologies, inference, and entity resolution sound impenetrable.

Graph Database Fundamentals

A graph database stores data as a network of nodes (entities — a person, a product, a company) and edges (relationships between entities — “WORKS_AT”, “PURCHASED”, “LOCATED_IN”). Both nodes and edges can have properties — key-value attributes like name, date, or price.

Edges can be directed (the relationship has a direction: Person A FOLLOWS Person B) or undirected (the relationship is symmetric: Person A IS_FRIENDS_WITH Person B). Each node or edge can have a label that defines its type — (:Person), (:Product), [:PURCHASED].

Cypher is the query language used by Neo4j. Its syntax is designed to look like ASCII art of a graph: MATCH (p:Person)-[:WORKS_AT]->(c:Company) WHERE c.name = 'Acme' RETURN p.name. Engineers often say: “Write a Cypher query to find all customers who purchased from the same vendor as this customer, within the last 30 days.”

Gremlin is a graph traversal language used by Apache TinkerPop-compatible databases (like Amazon Neptune and JanusGraph). It uses a fluent, step-based style: g.V().hasLabel('person').out('knows').values('name').

RDF and the Semantic Web

RDF (Resource Description Framework) is a W3C standard for representing knowledge as triples: subject-predicate-object. Each triple makes one statement about the world: (dbr:London, dbo:country, dbr:United_Kingdom). A collection of RDF triples forms a knowledge graph.

SPARQL is the query language for RDF data — analogous to SQL for relational databases. Engineers say: “Run a SPARQL query against the Wikidata endpoint to extract all chemical compounds with a molecular weight above 500 daltons.”

An ontology is a formal, machine-readable specification of concepts and the relationships between them. OWL (Web Ontology Language) is the W3C standard for writing ontologies. A reasoner is a software component that uses the rules defined in an ontology to infer new facts from existing data — for example, inferring that a cat is a mammal if the ontology defines “cat subClassOf mammal” and the data states “Whiskers is a cat.”

The semantic web is Tim Berners-Lee’s vision of a web of linked, machine-readable data. Linked data is the practice of publishing structured data on the web using URIs and RDF, so datasets can be connected across organisational boundaries. Schema.org is a shared vocabulary (a lightweight ontology) used by websites to annotate their HTML content for search engines.

Knowledge Graph Engineering in Practice

Entity resolution (also called record linkage or deduplication) is the process of identifying when two different data records refer to the same real-world entity. In a knowledge graph, you might have “Apple Inc.” and “Apple Computer Company” as separate nodes that should be merged. Engineers say: “The entity resolution pipeline is generating too many false positives — we need to tighten the blocking key strategy.”

Knowledge graph embedding is the technique of representing nodes and edges as dense vectors in a continuous space (embeddings), enabling ML models to reason over the graph — for link prediction, entity classification, and similarity search. Methods include TransE, DistMult, and RotatE.

Common phrases from engineering discussions:

  • “The ontology doesn’t model this relationship — we’ll need to extend the schema before ingesting this data source.”
  • “The reasoner is inferring too many spurious triples — the ontology has an over-broad axiom.”
  • “We need column-level provenance on every triple — which source system it came from and when.”

Next Steps

Explore the Wikidata SPARQL endpoint (query.wikidata.org) and run a simple query — it has an interactive editor with examples. Write the query in SPARQL and then describe what it does in one or two English sentences using the vocabulary from this article. Translating between code and English is the core skill this vocabulary unlocks.