Advanced 12 terms

Knowledge Graphs

Vocabulary for building, querying, and reasoning over knowledge graphs and graph databases in AI and enterprise data applications.

  • Knowledge Graph /ˈnɒlɪdʒ ɡrɑːf/

    A structured representation of real-world entities and the relationships between them, stored as a graph of nodes (entities) and edges (relationships). Knowledge graphs encode factual knowledge in a machine-readable form, enabling reasoning, inference, and semantic search. Used by Google, LinkedIn, and Amazon to power search and recommendations.

    "We built a product knowledge graph connecting products, suppliers, components, and regulatory categories — each as nodes with typed relationships between them. When a supplier is flagged for compliance issues, the graph traversal immediately identifies all affected products and downstream categories without running complex SQL joins."
  • Graph Database /ɡrɑːf ˈdeɪtəbeɪs/

    A database management system designed to store, manage, and query data whose natural representation is a graph — nodes, edges, and properties. Optimised for traversing relationships, unlike relational databases optimised for set operations. Examples: Neo4j, Amazon Neptune, TigerGraph, ArangoDB.

    "We migrated our fraud detection system from a relational database to Neo4j — the graph database reduced our 6-hop relationship traversal query from 12 seconds to 40 milliseconds. The performance gain came from native graph storage: no JOIN operations, just pointer traversal between connected nodes."
  • Node (graph) /nəʊd/

    The fundamental entity in a graph — represents a real-world object, concept, or event (person, product, organisation, location). Nodes have labels (types) and properties (attributes). In a knowledge graph, nodes are the "things" that exist in the domain being modelled.

    "Our enterprise knowledge graph has six node types: Person, Organisation, Product, Location, Contract, and Event. Each Person node carries properties like name, role, and department. Two Person nodes connected by an REPORTS_TO edge represent an organisational hierarchy that drives our access control policies."
  • Edge (graph) /edʒ/

    A connection between two nodes in a graph, representing a relationship. Edges are directional (A → B) and have a type (WORKS_FOR, PURCHASED, LOCATED_IN). Edge properties can store metadata about the relationship: weight, timestamp, confidence score. The quality of a knowledge graph is largely determined by the richness of its edges.

    "The PURCHASED edges in our customer graph carry three properties: purchase_date, quantity, and channel. This lets us query not just "which customers bought product X" but "which customers bought product X in Q4, in-store, more than twice" — a query impossible to express efficiently in a relational model without multiple JOINs."
  • RDF Triple /ɑː diː ef ˈtrɪpəl/

    The fundamental data unit in the Resource Description Framework semantic web standard — a subject-predicate-object statement: (Paris, isCapitalOf, France). RDF triples are the building block of linked data and SPARQL-queryable knowledge bases. Every fact in an RDF knowledge graph is expressed as one or more triples.

    "Our regulatory knowledge base stores all requirements as RDF triples: (Regulation-GDPR, requires, DataSubjectConsent), (DataSubjectConsent, appliesTo, PersonalDataProcessing). This structure lets us query which regulations apply to a given data processing activity by traversing the triple store."
  • Ontology /ɒnˈtɒlədʒi/

    A formal specification of the concepts (classes), properties (attributes), and relationships (predicates) within a domain — defining the vocabulary and rules for a knowledge graph. An ontology ensures consistency: it defines what a "Product" is, what properties it must have, and what relationships it can participate in.

    "Before building the product knowledge graph we spent 6 weeks on the ontology — defining 22 classes, 180 properties, and 47 relationship types with domain and range constraints. The ontology investment paid off: it caught 3 integration bugs where upstream systems were encoding the same concept differently, preventing silent knowledge graph corruption."
  • SPARQL /ˈspɑːkəl/

    SPARQL Protocol and RDF Query Language — the standard query language for RDF-based knowledge bases and triple stores. Analogous to SQL for relational databases. SPARQL supports pattern matching over graph data, federated queries across distributed knowledge bases, and inference over ontological rules.

    "We query our regulatory knowledge graph in SPARQL: SELECT ?regulation WHERE { ?regulation :requires :DataSubjectConsent . :OurDataPipeline :processes :PersonalData } finds all regulations that apply to our data pipeline based on its data processing activities, replacing hundreds of manual compliance checklists."
  • Entity Resolution /ˈentɪti ˌrezəˈluːʃən/

    The process of identifying when two or more records refer to the same real-world entity — deduplicating "Apple Inc.", "Apple Computer", and "Apple" into a single canonical node in the knowledge graph. Also called record linkage or entity matching. A key data quality challenge in knowledge graph construction.

    "Our customer graph had 40,000 duplicate organisation nodes — the same company entered differently by various salespeople. Entity resolution using name similarity, address matching, and domain-matching algorithms merged these to 28,000 unique organisations, dramatically improving account-based targeting accuracy."
  • Knowledge Graph Embedding /ˈnɒlɪdʒ ɡrɑːf ɪmˈbedɪŋ/

    A technique that learns low-dimensional vector representations of nodes and edges in a knowledge graph, enabling machine learning tasks like link prediction, entity classification, and similarity search. Models: TransE, RotatE, ComplEx. Embeddings capture structural patterns from the graph topology.

    "We trained knowledge graph embeddings on our product-supplier-category graph using TransE — the resulting node vectors placed chemically similar compounds near each other in embedding space. This let us use cosine similarity to find substitute suppliers for out-of-stock materials, something impossible with the raw graph structure alone."
  • Graph Neural Network (GNN) /ɡrɑːf ˈnjʊərəl ˈnetwɜːk/

    A class of deep learning models that operate directly on graph-structured data — learning node and edge representations by aggregating information from local neighbourhoods. GNNs extend neural networks to non-Euclidean data and are used for fraud detection, drug discovery, recommendation systems, and traffic forecasting.

    "We trained a GNN on our transaction graph for fraud detection — each transaction node aggregates features from the buyer, seller, device, and recent transaction history nodes. The GNN captures ring-shaped fraud patterns that were invisible to feature-based classifiers operating on individual transaction rows."
  • Link Prediction /lɪŋk prɪˈdɪkʃən/

    A graph ML task that predicts whether a relationship should exist between two nodes — inferring missing or future edges from observed graph structure and node properties. Applications: recommending connections in social networks, predicting drug-target interactions, completing knowledge graphs with missing facts.

    "We use link prediction on our customer-product purchase graph to power cross-sell recommendations: the model predicts which product-customer pairs are likely to form a PURCHASED edge based on purchase history patterns in similar customer subgraphs. NDCG@10 improved 23% over the previous collaborative filtering approach."
  • Wikidata /ˈwɪkiˌdeɪtə/

    A free, collaboratively edited knowledge graph maintained by the Wikimedia Foundation — containing over 100 million data items representing entities, their properties, and relationships in a machine-readable, multilingual format. Used as an open-source training resource for AI models and as a knowledge base for entity linking.

    "We use Wikidata as the backbone for our entity linking pipeline — when the text mentions a company name, we resolve it to its Wikidata QID, which gives us structured facts about the company (headquarters, industry, subsidiaries) without building our own knowledge base from scratch."