From the course: Knowledge Graph Data Engineering for Generative AI Use Cases

Thinking about retrieval/traversal

- [Instructor] When an AI queries a graph, it will retrieve, based on the string provided, the UID in the graph that relates to that string. So making sure entities and relations and instances for that matter are disambiguated is a necessary first step. Merging and deduplicating the entities will make for a cleaner and more dependable retrieval. We have established we have UIDs and no duplication across the data. Well, except for that one customer, we're going to handle them later. But this is something that I would recommend you always check in those larger data sets. Make sure that you really understand that there is either little to no duplication. And if there is duplication, what the ramifications of that will be if both of those entities are being retrieved. The AI will have to pick which one it thinks is right, is it going to pick correctly? Retrieving information from your graph can be done by prompting an AI to query your graph via an endpoint. In a graph database that has semantic search integrations natively, semantic application frameworks like Apache Jena or querying the graph itself using SPARQL for RDF based graphs. And Cypher or openCypher for property graphs. Knowledge graphs have two unique ways they can be queried. One is using inference, which infers relations based on semantic rules. For instance, a transitive relation, such as relative of, means that if Sam is related to Jess and Jess is related to Raj, then Raj and Sam are also related. Even if that relation is not explicitly expressed, it's inferred. The second unique way to query a graph is called walking, or traversing the graph. This is when graph algorithms such as shortest path are used. Where if you query for the shortest path connecting Sam and Raj, the query would retrieve Jess, since Jess is the shortest path connecting Sam to Raj in our little mini example. There are other graph algorithms, and they are all pretty useful for doing graph analytics across networks of data. We won't go over those in this course, but I have added resources to the exercise files where you can check some of those out. When using your graph with AI, you can use standard information retrieval, NTT resolution, inference and graph algorithms at runtime or during offline analytics. Now that we have a sense of triple nodes and edges, let's talk about updating them and why that's critical in the ETL phase.

Contents