The Evolution and History of Knowledge Graphs: From Semantic Networks to Neuro-Symbolic AI

Explore Knowledge Graph of this Research

Key Points

Conceptual Origins: The intellectual roots of knowledge graphs lie in the semantic networks of the 1950s and 1960s, developed by researchers like Richard H. Richens and M. Ross Quillian to model human memory and machine translation [cite: 1, 2, 3]. *
Term Coinage (1972): Contrary to popular belief that Google invented the term, Edgar W. Schneider coined “knowledge graph” in 1972 within the context of modular instructional systems for linguistics and education [cite: 4, 5, 6]. *
The Dutch School (1980s): In the 1980s, the University of Groningen and University of Twente formally developed “Knowledge Graphs” as a specific system for representing natural language and expert knowledge, distinct from the later Semantic Web definition [cite: 4, 7, 8]. *
The Semantic Web Era: The 2000s saw the rise of Linked Open Data, with foundational projects like DBpedia (2007), Freebase (2007), and YAGO (2008) creating massive, structured datasets from Wikipedia and other sources [cite: 9, 10, 11]. *
Google’s Pivot (2012): The concept entered the mainstream when Google launched its Knowledge Graph in May 2012, shifting search from keyword matching (“strings”) to entity understanding (“things”) [cite: 12, 13]. *
Enterprise Adoption: Following Google, major tech firms developed proprietary graphs: Facebook (Graph Search, 2013), LinkedIn (Economic Graph), Uber Eats (Food Graph, 2018), Airbnb (Travel Context, 2018), and Amazon (Product Graph) [cite: 14, 15, 16, 17]. *
Modern Convergence: Today, knowledge graphs are merging with Large Language Models (LLMs) via techniques like GraphRAG and neuro-symbolic AI to provide factual grounding for generative AI [cite: 18, 19].

Introduction

The history of the knowledge graph is a narrative of convergence, where cognitive psychology, linguistics, database theory, and artificial intelligence collided to solve a singular problem: how to represent knowledge in a way that machines can process and humans can understand. While the term “Knowledge Graph” is frequently associated with Google’s 2012 announcement, the architectural and theoretical foundations span over six decades. The trajectory moves from the early attempts to map human semantic memory in the 1960s, through the formal logic of the Semantic Web in the early 2000s, to the massive industrial-scale graphs that power modern commerce and search engines.

The Pre-Computational Era

While modern knowledge graphs are digital, the underlying logic dates back centuries. The use of directed acyclic graphs as mnemonic tools and logical structures can be traced to the Tree of Porphyry in the 3rd century AD, a commentary on Aristotle’s categories [cite: 2]. These early structures established the fundamental principle of organizing concepts into hierarchical relationships (genus and differentia), a precursor to the “is-a” relationships found in modern ontologies.

The Genesis: Semantic Networks (1950s–1960s)

The computational history of knowledge graphs begins not with databases, but with the attempt to model language and the human mind.

Richard H. Richens and Machine Translation (1956)

In 1956, Richard H. Richens of the Cambridge Language Research Unit implemented “Semantic Nets” for the propositional calculus. His work was intended as an “interlingua” for machine translation of natural languages [cite: 2, 3]. Richens realized that to translate text accurately, a computer needed to understand the semantic relationships between words, not just their syntactic placement. This early work laid the groundwork for representing concepts as nodes and their relationships as edges.

Ross Quillian and the Structure of Memory (1960s)

The most significant leap in the 1960s came from M. Ross Quillian and Allan M. Collins. In his 1966 PhD thesis at Carnegie Mellon University and subsequent papers, Quillian proposed the Semantic Network as a model of human long-term memory [cite: 1, 20]. Quillian’s model was explicitly designed to allow computers to explore the meaning of English words through relationships. His graphs featured: * Nodes: Representing concepts or words (e.g., “Canary”, “Bird”). * Associative Links: Representing relationships such as class membership (“is-a”), modification, conjunction, and disjunction [cite: 20]. In 1969, Collins and Quillian published the seminal paper “Retrieval Time from Semantic Memory” [cite: 21, 22]. They conducted experiments measuring how long it took human subjects to verify statements like “A canary is a bird” versus “A canary is an animal.” The results supported a hierarchical storage model where properties are stored at the highest applicable level of abstraction (cognitive economy). For instance, the property “can fly” is stored with “Bird,” not redundantly with “Canary” or “Robin.” This research provided the cognitive plausibility for the inheritance hierarchies used in object-oriented programming and modern knowledge graphs [cite: 1, 23].

The “Knowledge Graph” Term and Formalization (1970s–1990s)

A common misconception is that the term “Knowledge Graph” is a 21st-century invention. Historical records confirm its specific usage in academia decades prior.

Edgar W. Schneider (1972)

The term “knowledge graph” was coined as early as 1972 by the Austrian linguist Edgar W. Schneider [cite: 4, 5, 6, 19]. Schneider used the term in the context of developing modular instructional systems for courses. His work focused on structuring information flows and dependencies in educational materials, effectively creating a graph of knowledge prerequisites and relationships. While his application was instructional rather than algorithmic AI, it established the nomenclature [cite: 6, 19].

The Dutch School: Groningen and Twente (1980s)

In the late 1980s, a significant academic project explicitly titled “Knowledge Graphs” was initiated jointly by the University of Groningen and the University of Twente in the Netherlands [cite: 4, 7, 8]. Led by researchers such as C. Hoede and F.N. Stokman, this project sought to design semantic networks with a rigorous mathematical foundation. Unlike the broad, often loosely defined semantic networks of the 1960s, the Dutch Knowledge Graphs restricted edges to a limited set of relations to facilitate graph algebras [cite: 19, 24]. Their goal was to integrate knowledge from different sources to represent natural language and support expert systems. This work highlighted the tension between the expressive power of a graph and the computational complexity of reasoning over it—a trade-off that remains relevant in modern description logics [cite: 3, 25].

Sowa’s Conceptual Graphs (1984)

Parallel to the Dutch project, John F. Sowa introduced Conceptual Graphs in 1984 [cite: 8]. Sowa’s system was a logic-based knowledge representation formalism derived from Charles Sanders Peirce’s existential graphs. Conceptual graphs provided a way to map natural language to a logical system that a computer could process, serving as an intermediate step between the linguistic ambiguity of semantic networks and the rigid formalism of predicate logic.

The Cyc Project (1984)

While not using the term “knowledge graph,” the Cyc project, started by Doug Lenat in 1984, represents one of the most ambitious attempts to build a comprehensive knowledge base [cite: 4]. Cyc aimed to capture “common sense” knowledge—the millions of implicit facts humans know (e.g., “once people die, they stay dead”)—which are necessary for robust AI reasoning. Cyc’s ontology and inference engine prefigured many challenges in modern knowledge graph construction, particularly regarding manual curation versus automated extraction.

The Semantic Web and Linked Data (2000s)

The turn of the millennium shifted the focus from isolated expert systems to the World Wide Web. Under the guidance of Tim Berners-Lee, the Semantic Web vision emerged: a web of data that machines could process as easily as humans process documents.

RDF and Ontologies

The W3C developed standards that became the technical backbone of knowledge graphs: * RDF (Resource Description Framework): A standard for data interchange that structures information as triples (Subject-Predicate-Object) [cite: 5]. * OWL (Web Ontology Language): A language for defining ontologies, allowing for rich semantic descriptions and inferencing [cite: 5].

DBpedia (2007)

A pivotal moment in the history of open knowledge graphs was the launch of DBpedia in 2007 [cite: 9, 26]. Initiated by researchers at the Free University of Berlin and Leipzig University (including Sören Auer and Chris Bizer), DBpedia aimed to extract structured content from Wikipedia [cite: 27]. Wikipedia infoboxes contained semi-structured data (e.g., population of a city, birth date of a person). DBpedia parsed this information and published it as Linked Data using RDF. This created a massive, queryable dataset that served as the “nucleus” of the Linked Open Data (LOD) cloud, allowing disparate datasets to link to DBpedia entities as a central reference point [cite: 26, 28].

Freebase (2007)

Also in 2007, the company Metaweb launched Freebase, described as a “collaborative knowledge base” [cite: 11]. Unlike DBpedia, which extracted from Wikipedia, Freebase allowed users to contribute data directly into a structured schema. It aimed to create a “global brain” of entities. Freebase introduced features like Freebase Suggest (autocomplete based on entities), which foreshadowed modern search experiences [cite: 11]. Google acquired Metaweb in 2010, and Freebase became the seed data for the Google Knowledge Graph.

YAGO (2008)

YAGO (Yet Another Great Ontology), developed at the Max Planck Institute for Informatics, was released in 2008 [cite: 10]. YAGO distinguished itself by focusing on high precision (claimed 95% accuracy) and logical consistency. It achieved this by unifying the taxonomic structure of WordNet (a lexical database) with the encyclopedic knowledge of Wikipedia [cite: 29, 30]. YAGO enforced strict semantic constraints, making it highly suitable for logical reasoning tasks, unlike the noisier DBpedia [cite: 31].

The Turning Point: Google’s Knowledge Graph (2012)

On May 16, 2012, Google fundamentally changed the search industry and popularized the term “Knowledge Graph” globally [cite: 12, 13].

“Things, Not Strings”

In a blog post titled “Introducing the Knowledge Graph: things, not strings,” Google executive Amit Singhal announced the shift [cite: 12]. For decades, search engines had operated on keyword matching (“strings”). If a user searched for “Taj Mahal,” the engine looked for pages containing those words. The Knowledge Graph allowed Google to understand “Taj Mahal” as an entity—specifically, to distinguish between the mausoleum in India, the Grammy Award-winning musician, or a casino in Atlantic City [cite: 12, 32]. *
Scale at Launch: The graph launched with 500 million objects and 3.5 billion facts [cite: 12]. *
Sources: It integrated data from Freebase, Wikipedia, and the CIA World Factbook [cite: 12, 13]. *
User Experience: This powered the “Knowledge Panels” (infoboxes) on the right side of search results, providing direct answers rather than just links [cite: 13]. This launch marked the transition of knowledge graphs from academic research and Semantic Web niche to critical internet infrastructure.

The Enterprise Renaissance (2013–Present)

Following Google’s success, other technology giants realized that their data—users, products, locations, and skills—could be best represented as a graph. This led to a wave of “Enterprise Knowledge Graphs.”

Facebook: Graph Search and the Entity Graph (2013)

In January 2013, Facebook announced Graph Search (Beta) [cite: 17, 33]. Unlike Google’s web search, Graph Search was designed to query the massive social graph of Facebook users. It allowed natural language queries like “Friends in New York who like Jay-Z” or “Restaurants my friends like” [cite: 33, 34]. Under the hood, Facebook developed the Entity Graph (Unicorn system) to map over 1 billion users and hundreds of billions of edges [cite: 35, 36]. While the consumer-facing Graph Search feature was eventually deprecated due to privacy and utility challenges, the underlying graph technology remains central to Facebook’s data infrastructure and ad targeting [cite: 17].

LinkedIn:

The Economic Graph LinkedIn developed the Economic Graph, a digital representation of the global economy. It maps relationships between members, companies, jobs, skills, schools, and knowledge [cite: 37]. This graph enables LinkedIn to provide insights into labor market trends, skills gaps, and hiring patterns. By 2022, they reported over 830 million members and 41,000 standardized skills in their graph [cite: 38].

Uber Eats: The Food Knowledge Graph (2018)

In June 2018, Uber Engineering revealed their Food Knowledge Graph to improve the Uber Eats experience [cite: 15]. *
The Problem: Users might search for “udon,” but if a restaurant lists “noodle soup,” a keyword search fails. *
The Solution: Uber modeled food entities (cuisines, dishes, ingredients) and their relationships. The graph allows the system to understand that “udon” is related to “Japanese cuisine” and “noodles,” enabling query expansion and better recommendations [cite: 15, 39]. *
Technology: They utilized graph learning (GraphSAGE) to generate embeddings for users and dishes, predicting what a user might want to eat based on the structural properties of the graph [cite: 40, 41].

Airbnb: Contextualizing Travel (2018)

Airbnb announced its knowledge graph work in 2018 and 2019 to move beyond simple location-based search [cite: 14, 42, 43]. *
Goal: To categorize inventory and provide travel context (e.g., “Is this home good for families?”, “Is this neighborhood known for arts?”). *
Structure: They built a taxonomy where nodes represented concepts (e.g., “Surfing”, “Sport”) and edges represented relationships (“Surfing is a Sport”). This allowed them to tag listings with attributes that weren’t explicitly stated by the host but inferred through the graph [cite: 42]. *
Infrastructure: Unlike some who used native graph databases, Airbnb initially built their graph storage on top of a relational database for reliability, using a query API to traverse the connections [cite: 44].

Amazon:

The Product Graph Amazon developed the Product Graph to handle the immense complexity of its catalog. Led by researchers like Luna Dong, Amazon shifted from a strict hierarchy to a graph structure to capture the relationships between products and real-world concepts [cite: 16, 45]. *
Application: If a user searches for “summer barbecue,” the Product Graph can link this concept to disparate items like “instant-read thermometer,” “outdoor speakers,” and “aprons,” even if those items don’t share a traditional category [cite: 16]. *
Scale: The graph integrates data from Amazon detail pages and the broader web, using machine learning to clean and link entities (e.g., resolving that “Tom Hanks” in a movie description is an “Actor”) [cite: 16].

eBay:

Beam and the Knowledge Graph eBay open-sourced their distributed knowledge graph, Beam, in 2019 [cite: 46]. eBay’s graph was designed to support “probabilistic reasoning,” helping the platform understand user intent when queries are ambiguous (e.g., “eggplant iphone case” vs. actual vegetables) [cite: 47].

Technological Evolution: From Triples to Embeddings

The history of knowledge graphs is also a history of the underlying technology.

The Database Divergence

RDF Stores (Triplestores): Born from the Semantic Web (e.g., Virtuoso, Stardog). These focus on interoperability, standards (SPARQL), and logical inference [cite: 48, 49]. *
Labeled Property Graphs (LPG): Popularized by Neo4j (released 2007/2010). LPGs allow nodes and edges to have internal properties (key-value pairs). They are generally considered more developer-friendly for application building and traversal algorithms compared to the rigid triple structure of RDF [cite: 48, 50, 51].

Knowledge Graph Embeddings and GNNs

In the late 2010s, the focus shifted toward integrating knowledge graphs with machine learning. *
Embeddings: Techniques were developed to translate graph structures into low-dimensional vector spaces (embeddings). This allows logical relationships to be used in mathematical models. *
Graph Neural Networks (GNNs): Companies like Uber and Pinterest applied GNNs to their graphs to perform recommendation tasks at massive scale, predicting links between users and items based on graph topology [cite: 40, 41].

The Modern Era: Generative AI and GraphRAG (2020s)

The most recent chapter in this history involves the collision of Knowledge Graphs with Large Language Models (LLMs).

The Hallucination Problem

While LLMs (like GPT-4) are powerful, they suffer from “hallucinations”—generating plausible but incorrect facts. Knowledge Graphs provide the structured, factual grounding that LLMs lack.

GraphRAG (Retrieval-Augmented Generation)

Emerging around 2023-2024, GraphRAG represents a hybrid approach. Instead of relying solely on vector similarity search (which can miss multi-hop reasoning), GraphRAG systems retrieve sub-graphs of relevant facts from a knowledge graph to “prompt” the LLM. This combines the reasoning capability of the graph with the fluency of the LLM [cite: 18, 19].

Neuro-Symbolic AI

This represents the closing of the circle. Early AI (1960s-80s) was symbolic (rules and graphs). Modern AI (2010s) was neural (deep learning). The current frontier is Neuro-Symbolic AI, where knowledge graphs (symbols) and neural networks work in tandem to create systems that can both learn from data and reason logically [cite: 18].

Summary

Conclusion

The knowledge graph has evolved from a cognitive model of human memory to a pedagogical tool, then to a strict logical framework for the web, and finally to the backbone of the world’s largest data ecosystems. While Google’s 2012 launch was the catalyst for mass adoption, the technology rests on the shoulders of linguists like Schneider and cognitive psychologists like Quillian. Today, as AI seeks to become more reliable and “reasoning-capable,” the structured, interconnected nature of the knowledge graph remains more relevant than ever.
Sources:

1. telnyx.com 2. wikipedia.org 3. realkm.com 4. immwit.com 5. dataversity.net 6. uts.edu.au 7. redfield.ai 8. utwente.nl 9. wikipedia.org 10. wikipedia.org 11. medium.com 12. blog.google 13. wikipedia.org 14. medium.com 15. uber.com 16. aboutamazon.com 17. wikipedia.org 18. medium.com 19. wikipedia.org 20. ahistoryofai.com 21. colorado.edu 22. semanticscholar.org 23. getpoplog.org 24. ceur-ws.org 25. researchgate.net 26. grokipedia.com 27. openlinksw.com 28. medium.com 29. nih.gov 30. github.io 31. stackoverflow.com 32. wordpress.com 33. fb.com 34. fb.com 35. thenextweb.com 36. siliconangle.com 37. linkedin.com 38. youtube.com 39. stardog.com 40. uber.com 41. youtube.com 42. infoq.com 43. medium.com 44. substack.com 45. medium.com 46. medium.com 47. youtube.com 48. analyticsweek.com 49. stardog.com 50. neo4j.com 51. tomsawyer.com