Transformers¶
Transformers are classes in KGX that allow for you to
Transformer¶
The base class for all Transformers in KGX.
-
class
kgx.transformers.transformer.
Transformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
object
Base class for performing a transformation.
- This can be,
from a source to an in-memory property graph (networkx.MultiDiGraph)
from an in-memory property graph to a target format or database (Neo4j, CSV, RDF Triple Store, TTL)
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict[source]¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None[source]¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool[source]¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None[source]¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None[source]¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None[source]¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None[source]¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None[source]¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict[source]¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict[source]¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
NeoTransformer¶
-
class
kgx.transformers.neo_transformer.
NeoTransformer
(graph: networkx.classes.multidigraph.MultiDiGraph = None, uri: str = None, username: str = None, password: str = None)[source]¶ Bases:
kgx.transformers.transformer.Transformer
Transformer for reading from and writing to a Neo4j database.
-
__init__
(graph: networkx.classes.multidigraph.MultiDiGraph = None, uri: str = None, username: str = None, password: str = None)[source]¶ Initialize an instance of NeoTransformer.
-
categorize
()¶ Find and validate category for every node in self.graph
-
count
(is_directed: bool = True) → int[source]¶ Get the total count of records to be fetched from the Neo4j database.
- Parameters
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)- Returns
The total count of records
- Return type
int
-
create_constraints
(categories: set) → None[source]¶ Create a unique constraint on node ‘id’ for all
categories
in Neo4j.- Parameters
categories (set) – Set of categories
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
generate_unwind_edge_query
(edge_label: str) → str[source]¶ Generate UNWIND cypher query for saving edges into Neo4j.
Query uses
self.DEFAULT_NODE_LABEL
to quickly lookup the required subject and object node.- Parameters
edge_label (str) – Edge label as string
- Returns
The UNWIND cypher query
- Return type
str
-
generate_unwind_node_query
(category: str) → str[source]¶ Generate UNWIND cypher query for saving nodes into Neo4j.
There should be a CONSTRAINT in Neo4j for
self.DEFAULT_NODE_LABEL
. The query usesself.DEFAULT_NODE_LABEL
as the node label to increase speed for adding nodes. The query also sets label toself.DEFAULT_NODE_LABEL
for any node to make sure that the CONSTRAINT applies.- Parameters
category (str) – Node category
- Returns
The UNWIND cypher query
- Return type
str
-
get_edges
(skip: int = 0, limit: int = 0, is_directed: bool = True) → List[Tuple[neo4jrestclient.client.Node, neo4jrestclient.client.Relationship, neo4jrestclient.client.Node]][source]¶ Get a page of edges from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)
- Returns
A list of 3-tuples of the form (neo4jrestclient.client.Node, neo4jrestclient.client.Relationship, neo4jrestclient.client.Node)
- Return type
list
-
get_filter
(key: str) → str[source]¶ Get the value for filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters
key (str) – Name of the filter
- Returns
Value corresponding to the given filter key, formatted for CQL
- Return type
str
-
get_nodes
(skip: int = 0, limit: int = 0) → List[neo4jrestclient.client.Node][source]¶ Get a page of nodes from the Neo4j database.
- Parameters
skip (int) – Records to skip
limit (int) – Total number of records to query for
- Returns
A list of neo4jrestclient.client.Node records
- Return type
list
-
get_pages
(query_function, start: int = 0, end: int = None, page_size: int = 10000, **kwargs) → list[source]¶ Get pages of size
page_size
from Neo4j. Returns an iterator of pages where number of pages is (end
-start
)/page_size
- Parameters
query_function (func) – The function to use to fetch records. Usually this is
self.get_nodes
orself.get_edges
start (int) – Start for pagination
end (int) – End for pagination
page_size (int) – Size of each page (
10000
, by default)**kwargs (dict) – Any additional arguments that might be relevant for
query_function
- Returns
An iterator for a list of records from Neo4j. The size of the list is
page_size
- Return type
list
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load
(start: int = 0, end: int = None, is_directed: bool = True) → None[source]¶ Read nodes and edges from a Neo4j database and create a networkx.MultiDiGraph
- Parameters
start (int) – Start for pagination
end (int) – End for pagination
is_directed (bool) – Are edges directed or undirected (
True
, by default, since edges in most cases are directed)
-
load_edge
(edge: neo4jrestclient.client.Relationship) → None[source]¶ Load an edge from neo4jrestclient.client.Relationship into networkx.MultiDiGraph
- Parameters
edge (neo4jrestclient.client.Relationship) – An edge
-
load_edges
(edges: List) → None[source]¶ Load edges into networkx.MultiDiGraph
- Parameters
edges (List) – A list of edge records
-
load_node
(node: neo4jrestclient.client.Node) → None[source]¶ Load node from neo4jrestclient.client.Node into networkx.MultiDiGraph
- Parameters
node (neo4jrestclient.client.Node) – A node
-
load_nodes
(nodes: List[neo4jrestclient.client.Node]) → None[source]¶ Load nodes into networkx.MultiDiGraph
- Parameters
nodes (List[neo4jrestclient.client.Node]) – A list of node records
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
neo4j_report
() → None[source]¶ Give a summary on the number of nodes and edges in the Neo4j database.
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
() → None[source]¶ Save all nodes and edges from networkx.MultiDiGraph into Neo4j.
TODO: To be deprecated.
-
save_edge
(obj: dict) → None[source]¶ Load an edge into Neo4j.
TODO: To be deprecated.
- Parameters
obj (dict) – A dictionary that represents an edge and its properties. The edge must have ‘subject’, ‘edge_label’ and ‘object’ properties. For all other necessary properties, refer to the BioLink Model.
-
save_edge_unwind
(edges_by_edge_label: Dict[str, list]) → None[source]¶ Save all edges into Neo4j using the UNWIND cypher clause.
- Parameters
edges_by_edge_label (dict) – A dictionary where edge label is the key and the value is a list of edges with that edge label
-
save_node
(obj: dict) → None[source]¶ Load a node into Neo4j.
TODO: To be deprecated.
- Parameters
obj (dict) – A dictionary that represents a node and its properties. The node must have ‘id’ property. For all other necessary properties, refer to the BioLink Model.
-
save_node_unwind
(nodes_by_category: Dict[str, list]) → None[source]¶ Save all nodes into Neo4j using the UNWIND cypher clause.
- Parameters
nodes_by_category (Dict[str, list]) – A dictionary where node category is the key and the value is a list of nodes of that category
-
save_with_unwind
() → None[source]¶ Save all nodes and edges from networkx.MultiDiGraph into Neo4j using the UNWIND cypher clause.
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
PandasTransformer¶
-
class
kgx.transformers.pandas_transformer.
PandasTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.transformer.Transformer
Transformer that parses a pandas.DataFrame, and loads nodes and edges into a networkx.MultiDiGraph
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export_edges
() → pandas.core.frame.DataFrame[source]¶ Export edges from networkx.MultiDiGraph as a pandas.DataFrame
- Returns
A Dataframe where each record corresponds to an edge from the networkx.MultiDiGraph
- Return type
pandas.DataFrame
-
export_nodes
() → pandas.core.frame.DataFrame[source]¶ Export nodes from networkx.MultiDiGraph as a pandas.DataFrame
- Returns
A Dataframe where each record corresponds to a node from the networkx.MultiDiGraph
- Return type
pandas.DataFrame
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load
(df: pandas.core.frame.DataFrame) → None[source]¶ Load a panda.DataFrame, containing either nodes or edges, into a networkx.MultiDiGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent nodes or edges
-
load_edge
(edge: Dict) → None[source]¶ Load an edge into a networkx.MultiDiGraph
- Parameters
edge (dict) – An edge
-
load_edges
(df: pandas.core.frame.DataFrame) → None[source]¶ Load edges from pandas.DataFrame into a networkx.MultiDiGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent edges
-
load_node
(node: Dict) → None[source]¶ Load a node into a networkx.MultiDiGraph
- Parameters
node (dict) – A node
-
load_nodes
(df: pandas.core.frame.DataFrame) → None[source]¶ Load nodes from pandas.DataFrame into a networkx.MultiDiGraph
- Parameters
df (pandas.DataFrame) – Dataframe containing records that represent nodes
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
parse
(filename: str, input_format: str = 'csv', provided_by: str = None, **kwargs) → None[source]¶ Parse a CSV/TSV (or plain text) file.
The file can represent either nodes (nodes.csv) or edges (edges.csv) or both (data.tar), where the tar archive contains nodes.csv and edges.csv
The file can also be data.tar.gz or data.tar.bz2
- Parameters
filename (str) – File to read from
input_format (str) – The input file format (
csv
, by default)provided_by (str) – Define the source providing the input file
kwargs (Dict) – Any additional arguments
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, extension: str = 'csv', mode: str = 'w', **kwargs) → str[source]¶ Writes two files representing the node set and edge set of a networkx.MultiDiGraph, and add them to a .tar archive.
- Parameters
filename (str) – Name of tar archive file to create
extension (str) – The output file format (
csv
, by default)mode (str) – Form of compression to use (
w
, by default, signifies no compression)kwargs (dict) – Any additional arguments
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
JsonTransformer¶
-
class
kgx.transformers.json_transformer.
JsonTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.pandas_transformer.PandasTransformer
Transformer that parses a JSON, and loads nodes and edges into a networkx.MultiDiGraph
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
export
() → Dict[source]¶ Export networkx.MultiDiGraph as a dictionary.
- Returns
A dictionary with a list nodes and a list of edges
- Return type
dict
-
export_edges
() → pandas.core.frame.DataFrame¶ Export edges from networkx.MultiDiGraph as a pandas.DataFrame
- Returns
A Dataframe where each record corresponds to an edge from the networkx.MultiDiGraph
- Return type
pandas.DataFrame
-
export_nodes
() → pandas.core.frame.DataFrame¶ Export nodes from networkx.MultiDiGraph as a pandas.DataFrame
- Returns
A Dataframe where each record corresponds to a node from the networkx.MultiDiGraph
- Return type
pandas.DataFrame
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load
(obj: Dict[str, List]) → None[source]¶ Load a JSON object, containing nodes and edges, into a networkx.MultiDiGraph
- Parameters
obj (dict) – JSON Object with all nodes and edges
-
load_edge
(edge: Dict) → None¶ Load an edge into a networkx.MultiDiGraph
- Parameters
edge (dict) – An edge
-
load_edges
(edges: List[Dict]) → None[source]¶ Load a list of edges into a networkx.MultiDiGraph
- Parameters
edges (list) – List of edges
-
load_node
(node: Dict) → None¶ Load a node into a networkx.MultiDiGraph
- Parameters
node (dict) – A node
-
load_nodes
(nodes: List[Dict]) → None[source]¶ Load a list of nodes into a networkx.MultiDiGraph
- Parameters
nodes (list) – List of nodes
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
parse
(filename: str, input_format: str = 'json', provided_by: str = None, **kwargs) → None[source]¶ Parse a JSON file of the format,
- {
“nodes” : […], “edges” : […],
}
- Parameters
filename (str) – JSON file to read from
input_format (str) – The input file format (
json
, by default)provided_by (str) – Define the source providing the input file
kwargs (dict) – Any additional arguments
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str, **kwargs) → None[source]¶ Write networkx.MultiDiGraph to a file as JSON.
- Parameters
filename (str) – Filename to write to
kwargs (dict) – Any additional arguments
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
LogicTermTransformer¶
-
class
kgx.transformers.logicterm_transformer.
LogicTermTransformer
(source: Union[kgx.transformers.transformer.Transformer, networkx.classes.multidigraph.MultiDiGraph] = None, output_format=None, **args)[source]¶ Bases:
kgx.transformers.transformer.Transformer
TODO: Motivation for LogicTermTransformer?
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
NxTransformer¶
-
class
kgx.transformers.nx_transformer.
GraphMLTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.nx_transformer.NetworkxTransformer
I/O for graphml TODO: do we need to support GraphML
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.nx_transformer.
NetworkxTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.transformer.Transformer
Base class for networkx transforms TODO: use case for this class
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
RdfGraphMixin¶
A mixin for handling operations on RDF-stores.
-
class
kgx.transformers.rdf_graph_mixin.
RdfGraphMixin
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
object
- A mixin that defines the following methods,
load_networkx_graph(): template method that all deriving classes should implement
add_node(): method to add a node from a RDF form to property graph form
add_node_attribute(): method to add a node attribute from a RDF form to property graph form
add_edge(): method to add an edge from a RDF form to property graph form
add_edge_attribute(): method to add an edge attribute from an RDF form to property graph form
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str][source]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None[source]¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str[source]¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None[source]¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]¶ This method should be overridden and be implemented by the derived class, and should load all desired nodes and edges from rdflib.Graph into networkx.MultiDiGraph
Its preferred that this method does not use the networkx API directly when adding nodes, edges, and their attributes.
- Instead, Using the following methods,
add_node()
add_node_attribute()
add_edge()
add_edge_attribute()
to ensure that nodes, edges, and their attributes are added in conformance with the BioLink Model, and that URIRef’s are translated into CURIEs or BioLink Model elements whenever appropriate.
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (list) – A list of rdflib.URIRef representing predicates to be loaded
kwargs (dict) – Any additional arguments
RdfTransformer¶
-
class
kgx.transformers.rdf_transformer.
ObanRdfTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.rdf_transformer.RdfTransformer
Transformer that parses a ‘turtle’ file and loads triples, as nodes and edges, into a networkx.MultiDiGraph
This Transformer supports OBAN style of modeling where, - it dereifies OBAN.association triples into a property graph form - it reifies property graph into OBAN.association triples
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_ontology
(file: str) → None¶ Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]¶ Walk through the rdflib.Graph and load all triples into networkx.MultiDiGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (list) – A list of rdflib.URIRef representing predicates to be loaded
kwargs (dict) – Any additional arguments
-
load_node_attributes
(rdfgraph: rdflib.graph.Graph) → None¶ This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.
This method assumes that
RdfTransformer.load_edges()
has been called, and that all nodes have had their IRI as an attribute.- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
parse
(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None¶ Parse a file, containing triples, into a rdflib.Graph
The file can be either a ‘turtle’ file or any other format supported by rdflib.
- Parameters
filename (str) – File to read from.
input_format (str) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
provided_by (str) – Define the source providing the input file.
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
save
(filename: str = None, output_format: str = 'turtle', **kwargs) → None[source]¶ Transform networkx.MultiDiGraph into rdflib.Graph that follow OBAN-style reification and export this graph as a file (
turtle
, by default).- Parameters
filename (str) – Filename to write to
output_format (str) – The output format; default:
turtle
kwargs (dict) – Any additional arguments
-
save_attribute
(rdfgraph: rdflib.graph.Graph, object_iri: rdflib.term.URIRef, key: str, value: Union[List[str], str]) → None[source]¶ Saves a node or edge attributes from networkx.MultiDiGraph into rdflib.Graph
Intended to be used within ObanRdfTransformer.save().
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
object_iri (rdflib.URIRef) – IRI of an object in the graph
key (str) – The name of the attribute
value (Union[List[str], str]) – The value of the attribute; Can be either a List or just a string
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
uriref
(identifier: str) → rdflib.term.URIRef[source]¶ Generate a rdflib.URIRef for a given string.
- Parameters
identifier (str) – Identifier as string.
- Returns
URIRef form of the input
identifier
- Return type
rdflib.URIRef
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.rdf_transformer.
RdfOwlTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.rdf_transformer.RdfTransformer
Transformer that parses an OWL ontology in RDF, while retaining class-class relationships.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_ontology
(file: str) → None¶ Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]¶ Walk through the rdflib.Graph and load all triples into networkx.MultiDiGraph
- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (list) – A list of rdflib.URIRef representing predicates to be loaded
kwargs (dict) – Any additional arguments
-
load_node_attributes
(rdfgraph: rdflib.graph.Graph) → None¶ This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.
This method assumes that
RdfTransformer.load_edges()
has been called, and that all nodes have had their IRI as an attribute.- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
parse
(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None¶ Parse a file, containing triples, into a rdflib.Graph
The file can be either a ‘turtle’ file or any other format supported by rdflib.
- Parameters
filename (str) – File to read from.
input_format (str) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
provided_by (str) – Define the source providing the input file.
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.rdf_transformer.
RdfTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.rdf_graph_mixin.RdfGraphMixin
,kgx.transformers.transformer.Transformer
Transformer that parses RDF and loads triples, as nodes and edges, into a networkx.MultiDiGraph
This is the base class which is used to implement other RDF-based transformers.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_ontology
(file: str) → None[source]¶ Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]¶ Walk through the rdflib.Graph and load all required triples into networkx.MultiDiGraph
- By default this method loads the following predicates,
RDFS.subClassOf
OWL.sameAs
OWL.equivalentClass
is_about
(IAO:0000136)has_subsequence
(RO:0002524)is_subsequence_of
(RO:0002525)
This behavior can be overridden by providing a list of rdflib.URIRef that ought to be loaded via the
predicates
parameter.- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
predicates (list) – A list of rdflib.URIRef representing predicates to be loaded
kwargs (dict) – Any additional arguments
-
load_node_attributes
(rdfgraph: rdflib.graph.Graph) → None[source]¶ This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.
This method assumes that
RdfTransformer.load_edges()
has been called, and that all nodes have had their IRI as an attribute.- Parameters
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
parse
(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None[source]¶ Parse a file, containing triples, into a rdflib.Graph
The file can be either a ‘turtle’ file or any other format supported by rdflib.
- Parameters
filename (str) – File to read from.
input_format (str) – The input file format. If
None
is provided then the format is guessed usingrdflib.util.guess_format()
provided_by (str) – Define the source providing the input file.
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
SparqlTransformer¶
-
class
kgx.transformers.sparql_transformer.
MonarchSparqlTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]¶ Bases:
kgx.transformers.sparql_transformer.SparqlTransformer
see neo_transformer for discussion
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_filters
() → Dict¶ Gets the current filter map, transforming if necessary.
- Returns
Returns a dictionary with all filters
- Return type
dict
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None¶ Fetch triples from the SPARQL endpoint and load them as edges.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (set) – A set containing predicates in rdflib.URIRef form
kwargs (dict) – Any additional arguments.
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
query
(q: str) → Dict¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
dict
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.sparql_transformer.
RedSparqlTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None, url: str = 'http://graphdb.dumontierlab.com/repositories/ncats-red-kg')[source]¶ Bases:
kgx.transformers.sparql_transformer.SparqlTransformer
Transformer for communicating with Data2Services Knowledge Graph, a.k.a. Translator Red KG.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
categorize
() → None[source]¶ Checks for a node’s category property and assigns a category from BioLink Model. TODO: categorize for edges?
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_filters
() → Dict¶ Gets the current filter map, transforming if necessary.
- Returns
Returns a dictionary with all filters
- Return type
dict
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs: Dict) → None[source]¶ Fetch all triples using the specified predicates and add them to networkx.MultiDiGraph.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (set) – A set containing predicates in rdflib.URIRef form
kwargs (dict) – Any additional arguments. Ex: specifying ‘limit’ argument will limit the number of triples fetched.
-
load_nodes
(node_set: Set) → None[source]¶ Load nodes into networkx.MultiDiGraph.
This method queries the SPARQL endpoint for all triples where nodes in the node_set is a subject.
- Parameters
node_set (list) – A list of node CURIEs
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
query
(q: str) → Dict¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
dict
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-
-
class
kgx.transformers.sparql_transformer.
SparqlTransformer
(source_graph: networkx.classes.multidigraph.MultiDiGraph = None, url: str = None)[source]¶ Bases:
kgx.transformers.rdf_graph_mixin.RdfGraphMixin
,kgx.transformers.transformer.Transformer
Transformer for communicating with a SPARQL endpoint.
-
add_edge
(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]¶ This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.
Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.
- Parameters
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
- Returns
A 3-nary tuple (of the form subject, object, predicate) that represents the edge
- Return type
Tuple[str, str, str]
-
add_edge_attribute
(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None¶ Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the nodes in the edge does not exist then they will be created using
subject_iri
andobject_iri
.If the edge itself does not exist then it will be created using
subject_iri
,object_iri
andpredicate_iri
.- Parameters
subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph
object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph
predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
add_node
(iri: rdflib.term.URIRef) → str¶ This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.
Returns the CURIE identifier for the node in the networkx.MultiDiGraph
- Parameters
iri (rdflib.URIRef) – IRI of a node
- Returns
The CURIE identifier of a node
- Return type
str
-
add_node_attribute
(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None¶ Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.
The
key
may be a rdflib.URIRef or a URI string that maps onto a property name as defined inrdf_utils.property_mapping
.If the node does not exist then it is created using the given
iri
.- Parameters
iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (str) – The value of the attribute
-
categorize
()¶ Find and validate category for every node in self.graph
-
static
dump
(g: networkx.classes.multidigraph.MultiDiGraph) → Dict¶ Convert networkx.MultiDiGraph as a dictionary.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
- Returns
A dictionary
- Return type
dict
-
static
dump_to_file
(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None¶ Serialize networkx.MultiDiGraph as JSON and write to file.
- Parameters
g (networkx.MultiDiGraph) – Graph to convert as a dictionary
filename (str) – File to write the JSON
-
get_filters
() → Dict[source]¶ Gets the current filter map, transforming if necessary.
- Returns
Returns a dictionary with all filters
- Return type
dict
-
is_empty
() → bool¶ Check whether self.graph is empty.
- Returns
A boolean value asserting whether the graph is empty or not
- Return type
bool
-
load_networkx_graph
(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]¶ Fetch triples from the SPARQL endpoint and load them as edges.
- Parameters
rdfgraph (rdflib.Graph) – A rdflib Graph (unused)
predicates (set) – A set containing predicates in rdflib.URIRef form
kwargs (dict) – Any additional arguments.
-
merge_graphs
(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None¶ Merge all graphs with
self.graph
If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’
If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right
If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property
If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right
- Parameters
graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph
-
query
(q: str) → Dict[source]¶ Query a SPARQL endpoint.
- Parameters
q (str) – The query string
- Returns
A dictionary containing results from the query
- Return type
dict
-
remap_edge_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in edge
old_property
attribute with value from edgenew_property
attribute.- Parameters
type (string) – label referring to edges whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
remap_node_identifier
(type: str, new_property: str, prefix=None) → None¶ Remap a node’s ‘id’ attribute with value from a node’s
new_property
attribute.- Parameters
type (string) – label referring to nodes whose ‘id’ needs to be remapped
new_property (string) – property name from which the new value is pulled from
prefix (string) – signifies that the value for
new_property
is a list and theprefix
indicates which value to pick from the list
-
remap_node_property
(type: str, old_property: str, new_property: str) → None¶ Remap the value in node
old_property
attribute with value from nodenew_property
attribute.- Parameters
type (string) – label referring to nodes whose property needs to be remapped
old_property (string) – old property name whose value needs to be replaced
new_property (string) – new property name from which the value is pulled from
-
report
() → None¶ Print a summary report about self.graph
-
static
restore
(data: Dict) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a dictionary.
- Parameters
data (dict) – Dictionary containing nodes and edges
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
static
restore_from_file
(filename) → networkx.classes.multidigraph.MultiDiGraph¶ Deserialize a networkx.MultiDiGraph from a JSON file.
- Parameters
filename (str) – File to read from
- Returns
A networkx.MultiDiGraph representation
- Return type
networkx.MultiDiGraph
-
set_filter
(key: str, value: Union[List[str], str]) → None¶ Set a filter, defined by a key and value pair. These filters are used to reduce the search space.
- Parameters
key (str) – The key for a filter
value (Union[List[str], str]) – The value for a filter. Can be either a string or a list
-
static
validate_edge
(edge: dict) → dict¶ Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters
edge (dict) – An edge represented as a dict
- Returns
An edge represented as a dict, with default assumptions applied.
- Return type
dict
-
static
validate_node
(node: dict) → dict¶ Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters
node (dict) – A node represented as a dict
- Returns
A node represented as a dict, with default assumptions applied.
- Return type
dict
-