Conceptual Origins: The intellectual roots of knowledge graphs lie in the semantic networks of the 1950s and 1960s, developed by researchers like Richard H. Richens and M. Ross Quillian to model human memory and machine translation [cite: 1, 2, 3]. *
Term Coinage (1972): Contrary to popular belief that Google invented the term, Edgar W. Schneider coined “knowledge graph” in 1972 within the context of modular instructional systems for linguistics and education [cite: 4, 5, 6]. *
The Dutch School (1980s): In the 1980s, the University of Groningen and University of Twente formally developed “Knowledge Graphs” as a specific system for representing natural language and expert knowledge, distinct from the later Semantic Web definition [cite: 4, 7, 8]. *
The Semantic Web Era: The 2000s saw the rise of Linked Open Data, with foundational projects like DBpedia (2007), Freebase (2007), and YAGO (2008) creating massive, structured datasets from Wikipedia and other sources [cite: 9, 10, 11]. *
Google’s Pivot (2012): The concept entered the mainstream when Google launched its Knowledge Graph in May 2012, shifting search from keyword matching (“strings”) to entity understanding (“things”) [cite: 12, 13]. *
Enterprise Adoption: Following Google, major tech firms developed proprietary graphs: Facebook (Graph Search, 2013), LinkedIn (Economic Graph), Uber Eats (Food Graph, 2018), Airbnb (Travel Context, 2018), and Amazon (Product Graph) [cite: 14, 15, 16, 17]. *
Modern Convergence: Today, knowledge graphs are merging with Large Language Models (LLMs) via techniques like GraphRAG and neuro-symbolic AI to provide factual grounding for generative AI [cite: 18, 19].
The history of the knowledge graph is a narrative of convergence, where cognitive psychology, linguistics, database theory, and artificial intelligence collided to solve a singular problem: how to represent knowledge in a way that machines can process and humans can understand. While the term “Knowledge Graph” is frequently associated with Google’s 2012 announcement, the architectural and theoretical foundations span over six decades. The trajectory moves from the early attempts to map human semantic memory in the 1960s, through the formal logic of the Semantic Web in the early 2000s, to the massive industrial-scale graphs that power modern commerce and search engines.
While modern knowledge graphs are digital, the underlying logic dates back centuries. The use of directed acyclic graphs as mnemonic tools and logical structures can be traced to the Tree of Porphyry in the 3rd century AD, a commentary on Aristotle’s categories [cite: 2]. These early structures established the fundamental principle of organizing concepts into hierarchical relationships (genus and differentia), a precursor to the “is-a” relationships found in modern ontologies.
The computational history of knowledge graphs begins not with databases, but with the attempt to model language and the human mind.
In 1956, Richard H. Richens of the Cambridge Language Research Unit implemented “Semantic Nets” for the propositional calculus. His work was intended as an “interlingua” for machine translation of natural languages [cite: 2, 3]. Richens realized that to translate text accurately, a computer needed to understand the semantic relationships between words, not just their syntactic placement. This early work laid the groundwork for representing concepts as nodes and their relationships as edges.
The most significant leap in the 1960s came from M. Ross Quillian and Allan M. Collins. In his 1966 PhD thesis at Carnegie Mellon University and subsequent papers, Quillian proposed the Semantic Network as a model of human long-term memory [cite: 1, 20]. Quillian’s model was explicitly designed to allow computers to explore the meaning of English words through relationships. His graphs featured: * Nodes: Representing concepts or words (e.g., “Canary”, “Bird”). * Associative Links: Representing relationships such as class membership (“is-a”), modification, conjunction, and disjunction [cite: 20]. In 1969, Collins and Quillian published the seminal paper “Retrieval Time from Semantic Memory” [cite: 21, 22]. They conducted experiments measuring how long it took human subjects to verify statements like “A canary is a bird” versus “A canary is an animal.” The results supported a hierarchical storage model where properties are stored at the highest applicable level of abstraction (cognitive economy). For instance, the property “can fly” is stored with “Bird,” not redundantly with “Canary” or “Robin.” This research provided the cognitive plausibility for the inheritance hierarchies used in object-oriented programming and modern knowledge graphs [cite: 1, 23].
A common misconception is that the term “Knowledge Graph” is a 21st-century invention. Historical records confirm its specific usage in academia decades prior.
The term “knowledge graph” was coined as early as 1972 by the Austrian linguist Edgar W. Schneider [cite: 4, 5, 6, 19]. Schneider used the term in the context of developing modular instructional systems for courses. His work focused on structuring information flows and dependencies in educational materials, effectively creating a graph of knowledge prerequisites and relationships. While his application was instructional rather than algorithmic AI, it established the nomenclature [cite: 6, 19].
In the late 1980s, a significant academic project explicitly titled “Knowledge Graphs” was initiated jointly by the University of Groningen and the University of Twente in the Netherlands [cite: 4, 7, 8]. Led by researchers such as C. Hoede and F.N. Stokman, this project sought to design semantic networks with a rigorous mathematical foundation. Unlike the broad, often loosely defined semantic networks of the 1960s, the Dutch Knowledge Graphs restricted edges to a limited set of relations to facilitate graph algebras [cite: 19, 24]. Their goal was to integrate knowledge from different sources to represent natural language and support expert systems. This work highlighted the tension between the expressive power of a graph and the computational complexity of reasoning over it—a trade-off that remains relevant in modern description logics [cite: 3, 25].
Parallel to the Dutch project, John F. Sowa introduced Conceptual Graphs in 1984 [cite: 8]. Sowa’s system was a logic-based knowledge representation formalism derived from Charles Sanders Peirce’s existential graphs. Conceptual graphs provided a way to map natural language to a logical system that a computer could process, serving as an intermediate step between the linguistic ambiguity of semantic networks and the rigid formalism of predicate logic.
While not using the term “knowledge graph,” the Cyc project, started by Doug Lenat in 1984, represents one of the most ambitious attempts to build a comprehensive knowledge base [cite: 4]. Cyc aimed to capture “common sense” knowledge—the millions of implicit facts humans know (e.g., “once people die, they stay dead”)—which are necessary for robust AI reasoning. Cyc’s ontology and inference engine prefigured many challenges in modern knowledge graph construction, particularly regarding manual curation versus automated extraction.
The turn of the millennium shifted the focus from isolated expert systems to the World Wide Web. Under the guidance of Tim Berners-Lee, the Semantic Web vision emerged: a web of data that machines could process as easily as humans process documents.
The W3C developed standards that became the technical backbone of knowledge graphs: * RDF (Resource Description Framework): A standard for data interchange that structures information as triples (Subject-Predicate-Object) [cite: 5]. * OWL (Web Ontology Language): A language for defining ontologies, allowing for rich semantic descriptions and inferencing [cite: 5].
A pivotal moment in the history of open knowledge graphs was the launch of DBpedia in 2007 [cite: 9, 26]. Initiated by researchers at the Free University of Berlin and Leipzig University (including Sören Auer and Chris Bizer), DBpedia aimed to extract structured content from Wikipedia [cite: 27]. Wikipedia infoboxes contained semi-structured data (e.g., population of a city, birth date of a person). DBpedia parsed this information and published it as Linked Data using RDF. This created a massive, queryable dataset that served as the “nucleus” of the Linked Open Data (LOD) cloud, allowing disparate datasets to link to DBpedia entities as a central reference point [cite: 26, 28].
Also in 2007, the company Metaweb launched Freebase, described as a “collaborative knowledge base” [cite: 11]. Unlike DBpedia, which extracted from Wikipedia, Freebase allowed users to contribute data directly into a structured schema. It aimed to create a “global brain” of entities. Freebase introduced features like Freebase Suggest (autocomplete based on entities), which foreshadowed modern search experiences [cite: 11]. Google acquired Metaweb in 2010, and Freebase became the seed data for the Google Knowledge Graph.
YAGO (Yet Another Great Ontology), developed at the Max Planck Institute for Informatics, was released in 2008 [cite: 10]. YAGO distinguished itself by focusing on high precision (claimed 95% accuracy) and logical consistency. It achieved this by unifying the taxonomic structure of WordNet (a lexical database) with the encyclopedic knowledge of Wikipedia [cite: 29, 30]. YAGO enforced strict semantic constraints, making it highly suitable for logical reasoning tasks, unlike the noisier DBpedia [cite: 31].
On May 16, 2012, Google fundamentally changed the search industry and popularized the term “Knowledge Graph” globally [cite: 12, 13].
In a blog post titled “Introducing the Knowledge Graph: things, not strings,” Google executive Amit Singhal announced the shift [cite: 12]. For decades, search engines had operated on keyword matching (“strings”). If a user searched for “Taj Mahal,” the engine looked for pages containing those words. The Knowledge Graph allowed Google to understand “Taj Mahal” as an entity—specifically, to distinguish between the mausoleum in India, the Grammy Award-winning musician, or a casino in Atlantic City [cite: 12, 32]. *
Scale at Launch: The graph launched with 500 million objects and 3.5 billion facts [cite: 12]. *
Sources: It integrated data from Freebase, Wikipedia, and the CIA World Factbook [cite: 12, 13]. *
User Experience: This powered the “Knowledge Panels” (infoboxes) on the right side of search results, providing direct answers rather than just links [cite: 13]. This launch marked the transition of knowledge graphs from academic research and Semantic Web niche to critical internet infrastructure.
Following Google’s success, other technology giants realized that their data—users, products, locations, and skills—could be best represented as a graph. This led to a wave of “Enterprise Knowledge Graphs.”
In January 2013, Facebook announced Graph Search (Beta) [cite: 17, 33]. Unlike Google’s web search, Graph Search was designed to query the massive social graph of Facebook users. It allowed natural language queries like “Friends in New York who like Jay-Z” or “Restaurants my friends like” [cite: 33, 34]. Under the hood, Facebook developed the Entity Graph (Unicorn system) to map over 1 billion users and hundreds of billions of edges [cite: 35, 36]. While the consumer-facing Graph Search feature was eventually deprecated due to privacy and utility challenges, the underlying graph technology remains central to Facebook’s data infrastructure and ad targeting [cite: 17].
The Economic Graph LinkedIn developed the Economic Graph, a digital representation of the global economy. It maps relationships between members, companies, jobs, skills, schools, and knowledge [cite: 37]. This graph enables LinkedIn to provide insights into labor market trends, skills gaps, and hiring patterns. By 2022, they reported over 830 million members and 41,000 standardized skills in their graph [cite: 38].
In June 2018, Uber Engineering revealed their Food Knowledge Graph to improve the Uber Eats experience [cite: 15]. *
The Problem: Users might search for “udon,” but if a restaurant lists “noodle soup,” a keyword search fails. *
The Solution: Uber modeled food entities (cuisines, dishes, ingredients) and their relationships. The graph allows the system to understand that “udon” is related to “Japanese cuisine” and “noodles,” enabling query expansion and better recommendations [cite: 15, 39]. *
Technology: They utilized graph learning (GraphSAGE) to generate embeddings for users and dishes, predicting what a user might want to eat based on the structural properties of the graph [cite: 40, 41].
Airbnb announced its knowledge graph work in 2018 and 2019 to move beyond simple location-based search [cite: 14, 42, 43]. *
Goal: To categorize inventory and provide travel context (e.g., “Is this home good for families?”, “Is this neighborhood known for arts?”). *
Structure: They built a taxonomy where nodes represented concepts (e.g., “Surfing”, “Sport”) and edges represented relationships (“Surfing is a Sport”). This allowed them to tag listings with attributes that weren’t explicitly stated by the host but inferred through the graph [cite: 42]. *
Infrastructure: Unlike some who used native graph databases, Airbnb initially built their graph storage on top of a relational database for reliability, using a query API to traverse the connections [cite: 44].
The Product Graph Amazon developed the Product Graph to handle the immense complexity of its catalog. Led by researchers like Luna Dong, Amazon shifted from a strict hierarchy to a graph structure to capture the relationships between products and real-world concepts [cite: 16, 45]. *
Application: If a user searches for “summer barbecue,” the Product Graph can link this concept to disparate items like “instant-read thermometer,” “outdoor speakers,” and “aprons,” even if those items don’t share a traditional category [cite: 16]. *
Scale: The graph integrates data from Amazon detail pages and the broader web, using machine learning to clean and link entities (e.g., resolving that “Tom Hanks” in a movie description is an “Actor”) [cite: 16].
Beam and the Knowledge Graph eBay open-sourced their distributed knowledge graph, Beam, in 2019 [cite: 46]. eBay’s graph was designed to support “probabilistic reasoning,” helping the platform understand user intent when queries are ambiguous (e.g., “eggplant iphone case” vs. actual vegetables) [cite: 47].
The history of knowledge graphs is also a history of the underlying technology.
RDF Stores (Triplestores): Born from the Semantic Web (e.g., Virtuoso, Stardog). These focus on interoperability, standards (SPARQL), and logical inference [cite: 48, 49]. *
Labeled Property Graphs (LPG): Popularized by Neo4j (released 2007/2010). LPGs allow nodes and edges to have internal properties (key-value pairs). They are generally considered more developer-friendly for application building and traversal algorithms compared to the rigid triple structure of RDF [cite: 48, 50, 51].
In the late 2010s, the focus shifted toward integrating knowledge graphs with machine learning. *
Embeddings: Techniques were developed to translate graph structures into low-dimensional vector spaces (embeddings). This allows logical relationships to be used in mathematical models. *
Graph Neural Networks (GNNs): Companies like Uber and Pinterest applied GNNs to their graphs to perform recommendation tasks at massive scale, predicting links between users and items based on graph topology [cite: 40, 41].
The most recent chapter in this history involves the collision of Knowledge Graphs with Large Language Models (LLMs).
While LLMs (like GPT-4) are powerful, they suffer from “hallucinations”—generating plausible but incorrect facts. Knowledge Graphs provide the structured, factual grounding that LLMs lack.
Emerging around 2023-2024, GraphRAG represents a hybrid approach. Instead of relying solely on vector similarity search (which can miss multi-hop reasoning), GraphRAG systems retrieve sub-graphs of relevant facts from a knowledge graph to “prompt” the LLM. This combines the reasoning capability of the graph with the fluency of the LLM [cite: 18, 19].
This represents the closing of the circle. Early AI (1960s-80s) was symbolic (rules and graphs). Modern AI (2010s) was neural (deep learning). The current frontier is Neuro-Symbolic AI, where knowledge graphs (symbols) and neural networks work in tandem to create systems that can both learn from data and reason logically [cite: 18].
Timeline | Era | Key Event / Innovation | Key Figures/Entities | 1956 | Semantic Nets for Machine Translation | Richard H. Richens | | 1966-69 | Semantic Networks for Memory; “Retrieval Time” experiments | Ross Quillian, Allan Collins | | 1972 | Coining the term “Knowledge Graph” (Instructional Systems) | Edgar W. Schneider | | 1980s | “Knowledge Graphs” Project (Netherlands); Conceptual Graphs | Univ. of Groningen/Twente; John Sowa | | 2001 | The Semantic Web Vision (RDF, OWL) | Tim Berners-Lee, W3C | | 2007 | Launch of DBpedia and Freebase | Auer, Bizer, Metaweb | | 2008 | Launch of YAGO | Max Planck Institute | | 2012 | Google Knowledge Graph Launch (“Things, not strings”) | Google (Amit Singhal) | | 2013 | Facebook Graph Search / Entity Graph | Facebook | | 2018 | Uber Eats Food Graph; Amazon Product Graph | Uber, Amazon | | 2019 | Airbnb Knowledge Graph; eBay Beam | Airbnb, eBay | | 2023+ | GraphRAG & Neuro-Symbolic AI | Industry-wide |
The knowledge graph has evolved from a cognitive model of human memory to a pedagogical tool, then to a strict logical framework for the web, and finally to the backbone of the world’s largest data ecosystems. While Google’s 2012 launch was the catalyst for mass adoption, the technology rests on the shoulders of linguists like Schneider and cognitive psychologists like Quillian. Today, as AI seeks to become more reliable and “reasoning-capable,” the structured, interconnected nature of the knowledge graph remains more relevant than ever.
Sources: