Transformers

Transformers are classes in KGX that allow for you to

Transformer

The base class for all Transformers in KGX.

class kgx.transformers.transformer.Transformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: object

Base class for performing a transformation.

This can be,
  • from a source to an in-memory property graph (networkx.MultiDiGraph)

  • from an in-memory property graph to a target format or database (Neo4j, CSV, RDF Triple Store, TTL)

categorize()[source]

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict[source]

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None[source]

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool[source]

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None[source]

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

remap_edge_property(type: str, old_property: str, new_property: str) → None[source]

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None[source]

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None[source]

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None[source]

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph[source]

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph[source]

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None[source]

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict[source]

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict[source]

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

NeoTransformer

class kgx.transformers.neo_transformer.NeoTransformer(graph: networkx.classes.multidigraph.MultiDiGraph = None, uri: str = None, username: str = None, password: str = None)[source]

Bases: kgx.transformers.transformer.Transformer

Transformer for reading from and writing to a Neo4j database.

__init__(graph: networkx.classes.multidigraph.MultiDiGraph = None, uri: str = None, username: str = None, password: str = None)[source]

Initialize an instance of NeoTransformer.

categorize()

Find and validate category for every node in self.graph

count(is_directed: bool = True) → int[source]

Get the total count of records to be fetched from the Neo4j database.

Parameters

is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

Returns

The total count of records

Return type

int

create_constraints(categories: set) → None[source]

Create a unique constraint on node ‘id’ for all categories in Neo4j.

Parameters

categories (set) – Set of categories

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

generate_unwind_edge_query(edge_label: str) → str[source]

Generate UNWIND cypher query for saving edges into Neo4j.

Query uses self.DEFAULT_NODE_LABEL to quickly lookup the required subject and object node.

Parameters

edge_label (str) – Edge label as string

Returns

The UNWIND cypher query

Return type

str

generate_unwind_node_query(category: str) → str[source]

Generate UNWIND cypher query for saving nodes into Neo4j.

There should be a CONSTRAINT in Neo4j for self.DEFAULT_NODE_LABEL. The query uses self.DEFAULT_NODE_LABEL as the node label to increase speed for adding nodes. The query also sets label to self.DEFAULT_NODE_LABEL for any node to make sure that the CONSTRAINT applies.

Parameters

category (str) – Node category

Returns

The UNWIND cypher query

Return type

str

get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True) → List[Tuple[neo4jrestclient.client.Node, neo4jrestclient.client.Relationship, neo4jrestclient.client.Node]][source]

Get a page of edges from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

  • is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

Returns

A list of 3-tuples of the form (neo4jrestclient.client.Node, neo4jrestclient.client.Relationship, neo4jrestclient.client.Node)

Return type

list

get_filter(key: str) → str[source]

Get the value for filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters

key (str) – Name of the filter

Returns

Value corresponding to the given filter key, formatted for CQL

Return type

str

get_nodes(skip: int = 0, limit: int = 0) → List[neo4jrestclient.client.Node][source]

Get a page of nodes from the Neo4j database.

Parameters
  • skip (int) – Records to skip

  • limit (int) – Total number of records to query for

Returns

A list of neo4jrestclient.client.Node records

Return type

list

get_pages(query_function, start: int = 0, end: int = None, page_size: int = 10000, **kwargs) → list[source]

Get pages of size page_size from Neo4j. Returns an iterator of pages where number of pages is (end - start)/page_size

Parameters
  • query_function (func) – The function to use to fetch records. Usually this is self.get_nodes or self.get_edges

  • start (int) – Start for pagination

  • end (int) – End for pagination

  • page_size (int) – Size of each page (10000, by default)

  • **kwargs (dict) – Any additional arguments that might be relevant for query_function

Returns

An iterator for a list of records from Neo4j. The size of the list is page_size

Return type

list

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load(start: int = 0, end: int = None, is_directed: bool = True) → None[source]

Read nodes and edges from a Neo4j database and create a networkx.MultiDiGraph

Parameters
  • start (int) – Start for pagination

  • end (int) – End for pagination

  • is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)

load_edge(edge: neo4jrestclient.client.Relationship) → None[source]

Load an edge from neo4jrestclient.client.Relationship into networkx.MultiDiGraph

Parameters

edge (neo4jrestclient.client.Relationship) – An edge

load_edges(edges: List) → None[source]

Load edges into networkx.MultiDiGraph

Parameters

edges (List) – A list of edge records

load_node(node: neo4jrestclient.client.Node) → None[source]

Load node from neo4jrestclient.client.Node into networkx.MultiDiGraph

Parameters

node (neo4jrestclient.client.Node) – A node

load_nodes(nodes: List[neo4jrestclient.client.Node]) → None[source]

Load nodes into networkx.MultiDiGraph

Parameters

nodes (List[neo4jrestclient.client.Node]) – A list of node records

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

neo4j_report() → None[source]

Give a summary on the number of nodes and edges in the Neo4j database.

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

save() → None[source]

Save all nodes and edges from networkx.MultiDiGraph into Neo4j.

TODO: To be deprecated.

save_edge(obj: dict) → None[source]

Load an edge into Neo4j.

TODO: To be deprecated.

Parameters

obj (dict) – A dictionary that represents an edge and its properties. The edge must have ‘subject’, ‘edge_label’ and ‘object’ properties. For all other necessary properties, refer to the BioLink Model.

save_edge_unwind(edges_by_edge_label: Dict[str, list]) → None[source]

Save all edges into Neo4j using the UNWIND cypher clause.

Parameters

edges_by_edge_label (dict) – A dictionary where edge label is the key and the value is a list of edges with that edge label

save_node(obj: dict) → None[source]

Load a node into Neo4j.

TODO: To be deprecated.

Parameters

obj (dict) – A dictionary that represents a node and its properties. The node must have ‘id’ property. For all other necessary properties, refer to the BioLink Model.

save_node_unwind(nodes_by_category: Dict[str, list]) → None[source]

Save all nodes into Neo4j using the UNWIND cypher clause.

Parameters

nodes_by_category (Dict[str, list]) – A dictionary where node category is the key and the value is a list of nodes of that category

save_with_unwind() → None[source]

Save all nodes and edges from networkx.MultiDiGraph into Neo4j using the UNWIND cypher clause.

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

PandasTransformer

class kgx.transformers.pandas_transformer.PandasTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.transformer.Transformer

Transformer that parses a pandas.DataFrame, and loads nodes and edges into a networkx.MultiDiGraph

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

export_edges() → pandas.core.frame.DataFrame[source]

Export edges from networkx.MultiDiGraph as a pandas.DataFrame

Returns

A Dataframe where each record corresponds to an edge from the networkx.MultiDiGraph

Return type

pandas.DataFrame

export_nodes() → pandas.core.frame.DataFrame[source]

Export nodes from networkx.MultiDiGraph as a pandas.DataFrame

Returns

A Dataframe where each record corresponds to a node from the networkx.MultiDiGraph

Return type

pandas.DataFrame

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load(df: pandas.core.frame.DataFrame) → None[source]

Load a panda.DataFrame, containing either nodes or edges, into a networkx.MultiDiGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent nodes or edges

load_edge(edge: Dict) → None[source]

Load an edge into a networkx.MultiDiGraph

Parameters

edge (dict) – An edge

load_edges(df: pandas.core.frame.DataFrame) → None[source]

Load edges from pandas.DataFrame into a networkx.MultiDiGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent edges

load_node(node: Dict) → None[source]

Load a node into a networkx.MultiDiGraph

Parameters

node (dict) – A node

load_nodes(df: pandas.core.frame.DataFrame) → None[source]

Load nodes from pandas.DataFrame into a networkx.MultiDiGraph

Parameters

df (pandas.DataFrame) – Dataframe containing records that represent nodes

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

parse(filename: str, input_format: str = 'csv', provided_by: str = None, **kwargs) → None[source]

Parse a CSV/TSV (or plain text) file.

The file can represent either nodes (nodes.csv) or edges (edges.csv) or both (data.tar), where the tar archive contains nodes.csv and edges.csv

The file can also be data.tar.gz or data.tar.bz2

Parameters
  • filename (str) – File to read from

  • input_format (str) – The input file format (csv, by default)

  • provided_by (str) – Define the source providing the input file

  • kwargs (Dict) – Any additional arguments

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

save(filename: str, extension: str = 'csv', mode: str = 'w', **kwargs) → str[source]

Writes two files representing the node set and edge set of a networkx.MultiDiGraph, and add them to a .tar archive.

Parameters
  • filename (str) – Name of tar archive file to create

  • extension (str) – The output file format (csv, by default)

  • mode (str) – Form of compression to use (w, by default, signifies no compression)

  • kwargs (dict) – Any additional arguments

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

JsonTransformer

class kgx.transformers.json_transformer.JsonTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.pandas_transformer.PandasTransformer

Transformer that parses a JSON, and loads nodes and edges into a networkx.MultiDiGraph

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

export() → Dict[source]

Export networkx.MultiDiGraph as a dictionary.

Returns

A dictionary with a list nodes and a list of edges

Return type

dict

export_edges() → pandas.core.frame.DataFrame

Export edges from networkx.MultiDiGraph as a pandas.DataFrame

Returns

A Dataframe where each record corresponds to an edge from the networkx.MultiDiGraph

Return type

pandas.DataFrame

export_nodes() → pandas.core.frame.DataFrame

Export nodes from networkx.MultiDiGraph as a pandas.DataFrame

Returns

A Dataframe where each record corresponds to a node from the networkx.MultiDiGraph

Return type

pandas.DataFrame

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load(obj: Dict[str, List]) → None[source]

Load a JSON object, containing nodes and edges, into a networkx.MultiDiGraph

Parameters

obj (dict) – JSON Object with all nodes and edges

load_edge(edge: Dict) → None

Load an edge into a networkx.MultiDiGraph

Parameters

edge (dict) – An edge

load_edges(edges: List[Dict]) → None[source]

Load a list of edges into a networkx.MultiDiGraph

Parameters

edges (list) – List of edges

load_node(node: Dict) → None

Load a node into a networkx.MultiDiGraph

Parameters

node (dict) – A node

load_nodes(nodes: List[Dict]) → None[source]

Load a list of nodes into a networkx.MultiDiGraph

Parameters

nodes (list) – List of nodes

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

parse(filename: str, input_format: str = 'json', provided_by: str = None, **kwargs) → None[source]

Parse a JSON file of the format,

{

“nodes” : […], “edges” : […],

}

Parameters
  • filename (str) – JSON file to read from

  • input_format (str) – The input file format (json, by default)

  • provided_by (str) – Define the source providing the input file

  • kwargs (dict) – Any additional arguments

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

save(filename: str, **kwargs) → None[source]

Write networkx.MultiDiGraph to a file as JSON.

Parameters
  • filename (str) – Filename to write to

  • kwargs (dict) – Any additional arguments

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

LogicTermTransformer

class kgx.transformers.logicterm_transformer.LogicTermTransformer(source: Union[kgx.transformers.transformer.Transformer, networkx.classes.multidigraph.MultiDiGraph] = None, output_format=None, **args)[source]

Bases: kgx.transformers.transformer.Transformer

TODO: Motivation for LogicTermTransformer?

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

save(filename: str, format='sxpr', zipmode='w', **kwargs)[source]
set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

NxTransformer

class kgx.transformers.nx_transformer.GraphMLTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.nx_transformer.NetworkxTransformer

I/O for graphml TODO: do we need to support GraphML

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.nx_transformer.NetworkxTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.transformer.Transformer

Base class for networkx transforms TODO: use case for this class

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

RdfGraphMixin

A mixin for handling operations on RDF-stores.

class kgx.transformers.rdf_graph_mixin.RdfGraphMixin(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: object

A mixin that defines the following methods,
  • load_networkx_graph(): template method that all deriving classes should implement

  • add_node(): method to add a node from a RDF form to property graph form

  • add_node_attribute(): method to add a node attribute from a RDF form to property graph form

  • add_edge(): method to add an edge from a RDF form to property graph form

  • add_edge_attribute(): method to add an edge attribute from an RDF form to property graph form

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str][source]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None[source]

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str[source]

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None[source]

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]

This method should be overridden and be implemented by the derived class, and should load all desired nodes and edges from rdflib.Graph into networkx.MultiDiGraph

Its preferred that this method does not use the networkx API directly when adding nodes, edges, and their attributes.

Instead, Using the following methods,
  • add_node()

  • add_node_attribute()

  • add_edge()

  • add_edge_attribute()

to ensure that nodes, edges, and their attributes are added in conformance with the BioLink Model, and that URIRef’s are translated into CURIEs or BioLink Model elements whenever appropriate.

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (list) – A list of rdflib.URIRef representing predicates to be loaded

  • kwargs (dict) – Any additional arguments

RdfTransformer

class kgx.transformers.rdf_transformer.ObanRdfTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.rdf_transformer.RdfTransformer

Transformer that parses a ‘turtle’ file and loads triples, as nodes and edges, into a networkx.MultiDiGraph

This Transformer supports OBAN style of modeling where, - it dereifies OBAN.association triples into a property graph form - it reifies property graph into OBAN.association triples

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_ontology(file: str) → None

Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]

Walk through the rdflib.Graph and load all triples into networkx.MultiDiGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (list) – A list of rdflib.URIRef representing predicates to be loaded

  • kwargs (dict) – Any additional arguments

load_node_attributes(rdfgraph: rdflib.graph.Graph) → None

This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.

This method assumes that RdfTransformer.load_edges() has been called, and that all nodes have had their IRI as an attribute.

Parameters

rdfgraph (rdflib.Graph) – Graph containing nodes and edges

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

parse(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None

Parse a file, containing triples, into a rdflib.Graph

The file can be either a ‘turtle’ file or any other format supported by rdflib.

Parameters
  • filename (str) – File to read from.

  • input_format (str) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • provided_by (str) – Define the source providing the input file.

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

save(filename: str = None, output_format: str = 'turtle', **kwargs) → None[source]

Transform networkx.MultiDiGraph into rdflib.Graph that follow OBAN-style reification and export this graph as a file (turtle, by default).

Parameters
  • filename (str) – Filename to write to

  • output_format (str) – The output format; default: turtle

  • kwargs (dict) – Any additional arguments

save_attribute(rdfgraph: rdflib.graph.Graph, object_iri: rdflib.term.URIRef, key: str, value: Union[List[str], str]) → None[source]

Saves a node or edge attributes from networkx.MultiDiGraph into rdflib.Graph

Intended to be used within ObanRdfTransformer.save().

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • object_iri (rdflib.URIRef) – IRI of an object in the graph

  • key (str) – The name of the attribute

  • value (Union[List[str], str]) – The value of the attribute; Can be either a List or just a string

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

uriref(identifier: str) → rdflib.term.URIRef[source]

Generate a rdflib.URIRef for a given string.

Parameters

identifier (str) – Identifier as string.

Returns

URIRef form of the input identifier

Return type

rdflib.URIRef

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.rdf_transformer.RdfOwlTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.rdf_transformer.RdfTransformer

Transformer that parses an OWL ontology in RDF, while retaining class-class relationships.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_ontology(file: str) → None

Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]

Walk through the rdflib.Graph and load all triples into networkx.MultiDiGraph

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (list) – A list of rdflib.URIRef representing predicates to be loaded

  • kwargs (dict) – Any additional arguments

load_node_attributes(rdfgraph: rdflib.graph.Graph) → None

This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.

This method assumes that RdfTransformer.load_edges() has been called, and that all nodes have had their IRI as an attribute.

Parameters

rdfgraph (rdflib.Graph) – Graph containing nodes and edges

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

parse(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None

Parse a file, containing triples, into a rdflib.Graph

The file can be either a ‘turtle’ file or any other format supported by rdflib.

Parameters
  • filename (str) – File to read from.

  • input_format (str) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • provided_by (str) – Define the source providing the input file.

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.rdf_transformer.RdfTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.rdf_graph_mixin.RdfGraphMixin, kgx.transformers.transformer.Transformer

Transformer that parses RDF and loads triples, as nodes and edges, into a networkx.MultiDiGraph

This is the base class which is used to implement other RDF-based transformers.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_ontology(file: str) → None[source]

Load an ontology OWL into a Rdflib.Graph # TODO: is there better way of pre-loading required ontologies?

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]

Walk through the rdflib.Graph and load all required triples into networkx.MultiDiGraph

By default this method loads the following predicates,
  • RDFS.subClassOf

  • OWL.sameAs

  • OWL.equivalentClass

  • is_about (IAO:0000136)

  • has_subsequence (RO:0002524)

  • is_subsequence_of (RO:0002525)

This behavior can be overridden by providing a list of rdflib.URIRef that ought to be loaded via the predicates parameter.

Parameters
  • rdfgraph (rdflib.Graph) – Graph containing nodes and edges

  • predicates (list) – A list of rdflib.URIRef representing predicates to be loaded

  • kwargs (dict) – Any additional arguments

load_node_attributes(rdfgraph: rdflib.graph.Graph) → None[source]

This method loads the properties of nodes into networkx.MultiDiGraph As there can be many values for a single key, all properties are lists by default.

This method assumes that RdfTransformer.load_edges() has been called, and that all nodes have had their IRI as an attribute.

Parameters

rdfgraph (rdflib.Graph) – Graph containing nodes and edges

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

parse(filename: str = None, input_format: str = None, provided_by: str = None, predicates: Set[rdflib.term.URIRef] = None) → None[source]

Parse a file, containing triples, into a rdflib.Graph

The file can be either a ‘turtle’ file or any other format supported by rdflib.

Parameters
  • filename (str) – File to read from.

  • input_format (str) – The input file format. If None is provided then the format is guessed using rdflib.util.guess_format()

  • provided_by (str) – Define the source providing the input file.

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

SparqlTransformer

class kgx.transformers.sparql_transformer.MonarchSparqlTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None)[source]

Bases: kgx.transformers.sparql_transformer.SparqlTransformer

see neo_transformer for discussion

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

get_filters() → Dict

Gets the current filter map, transforming if necessary.

Returns

Returns a dictionary with all filters

Return type

dict

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None

Fetch triples from the SPARQL endpoint and load them as edges.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (set) – A set containing predicates in rdflib.URIRef form

  • kwargs (dict) – Any additional arguments.

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

query(q: str) → Dict

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

dict

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.sparql_transformer.RedSparqlTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None, url: str = 'http://graphdb.dumontierlab.com/repositories/ncats-red-kg')[source]

Bases: kgx.transformers.sparql_transformer.SparqlTransformer

Transformer for communicating with Data2Services Knowledge Graph, a.k.a. Translator Red KG.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

categorize() → None[source]

Checks for a node’s category property and assigns a category from BioLink Model. TODO: categorize for edges?

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

get_filters() → Dict

Gets the current filter map, transforming if necessary.

Returns

Returns a dictionary with all filters

Return type

dict

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs: Dict) → None[source]

Fetch all triples using the specified predicates and add them to networkx.MultiDiGraph.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (set) – A set containing predicates in rdflib.URIRef form

  • kwargs (dict) – Any additional arguments. Ex: specifying ‘limit’ argument will limit the number of triples fetched.

load_nodes(node_set: Set) → None[source]

Load nodes into networkx.MultiDiGraph.

This method queries the SPARQL endpoint for all triples where nodes in the node_set is a subject.

Parameters

node_set (list) – A list of node CURIEs

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

query(q: str) → Dict

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

dict

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict

class kgx.transformers.sparql_transformer.SparqlTransformer(source_graph: networkx.classes.multidigraph.MultiDiGraph = None, url: str = None)[source]

Bases: kgx.transformers.rdf_graph_mixin.RdfGraphMixin, kgx.transformers.transformer.Transformer

Transformer for communicating with a SPARQL endpoint.

add_edge(subject_iri: rdflib.term.URIRef, object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef) → Tuple[str, str, str]

This method should be used by all derived classes when adding an edge to the networkx.MultiDiGraph. This ensures that the subject and object identifiers are CURIEs, and that edge_label is in the correct form.

Returns the CURIE identifiers used for the subject and object in the networkx.MultiDiGraph, and the processed edge_label.

Parameters
  • subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple

  • object_iri (rdflib.URIRef) – Object IRI for the object in a triple

  • predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple

Returns

A 3-nary tuple (of the form subject, object, predicate) that represents the edge

Return type

Tuple[str, str, str]

add_edge_attribute(subject_iri: Union[rdflib.term.URIRef, str], object_iri: rdflib.term.URIRef, predicate_iri: rdflib.term.URIRef, key: str, value: str) → None

Adds an attribute to an edge, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the nodes in the edge does not exist then they will be created using subject_iri and object_iri.

If the edge itself does not exist then it will be created using subject_iri, object_iri and predicate_iri.

Parameters
  • subject_iri ([rdflib.URIRef, str]) – The IRI of the subject node of an edge in rdflib.Graph

  • object_iri (rdflib.URIRef) – The IRI of the object node of an edge in rdflib.Graph

  • predicate_iri (rdflib.URIRef) – The IRI of the predicate representing an edge in rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

add_node(iri: rdflib.term.URIRef) → str

This method should be used by all derived classes when adding a node to the networkx.MultiDiGraph. This ensures that a node’s identifier is a CURIE, and that it’s iri property is set.

Returns the CURIE identifier for the node in the networkx.MultiDiGraph

Parameters

iri (rdflib.URIRef) – IRI of a node

Returns

The CURIE identifier of a node

Return type

str

add_node_attribute(iri: Union[rdflib.term.URIRef, str], key: str, value: str) → None

Add an attribute to a node, while taking into account whether the attribute should be multi-valued. Multi-valued properties will not contain duplicates.

The key may be a rdflib.URIRef or a URI string that maps onto a property name as defined in rdf_utils.property_mapping.

If the node does not exist then it is created using the given iri.

Parameters
  • iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph

  • key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string

  • value (str) – The value of the attribute

categorize()

Find and validate category for every node in self.graph

static dump(g: networkx.classes.multidigraph.MultiDiGraph) → Dict

Convert networkx.MultiDiGraph as a dictionary.

Parameters

g (networkx.MultiDiGraph) – Graph to convert as a dictionary

Returns

A dictionary

Return type

dict

static dump_to_file(g: networkx.classes.multidigraph.MultiDiGraph, filename: str) → None

Serialize networkx.MultiDiGraph as JSON and write to file.

Parameters
  • g (networkx.MultiDiGraph) – Graph to convert as a dictionary

  • filename (str) – File to write the JSON

get_filters() → Dict[source]

Gets the current filter map, transforming if necessary.

Returns

Returns a dictionary with all filters

Return type

dict

is_empty() → bool

Check whether self.graph is empty.

Returns

A boolean value asserting whether the graph is empty or not

Return type

bool

load_networkx_graph(rdfgraph: rdflib.graph.Graph = None, predicates: Set[rdflib.term.URIRef] = None, **kwargs) → None[source]

Fetch triples from the SPARQL endpoint and load them as edges.

Parameters
  • rdfgraph (rdflib.Graph) – A rdflib Graph (unused)

  • predicates (set) – A set containing predicates in rdflib.URIRef form

  • kwargs (dict) – Any additional arguments.

merge_graphs(graphs: List[networkx.classes.multidigraph.MultiDiGraph]) → None

Merge all graphs with self.graph

  • If two nodes with same ‘id’ exist in two graphs, the nodes will be merged based on the ‘id’

  • If two nodes with the same ‘id’ exists in two graphs and they both have conflicting values for a property, then the value is overwritten from left to right

  • If two edges with the same ‘key’ exists in two graphs, the edge will be merged based on the ‘key’ property

  • If two edges with the same ‘key’ exists in two graphs and they both have one or more conflicting values for a property, then the value is overwritten from left to right

Parameters

graphs (List[networkx.MultiDiGraph]) – List of graphs that are to be merged with self.graph

query(q: str) → Dict[source]

Query a SPARQL endpoint.

Parameters

q (str) – The query string

Returns

A dictionary containing results from the query

Return type

dict

remap_edge_property(type: str, old_property: str, new_property: str) → None

Remap the value in edge old_property attribute with value from edge new_property attribute.

Parameters
  • type (string) – label referring to edges whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

remap_node_identifier(type: str, new_property: str, prefix=None) → None

Remap a node’s ‘id’ attribute with value from a node’s new_property attribute.

Parameters
  • type (string) – label referring to nodes whose ‘id’ needs to be remapped

  • new_property (string) – property name from which the new value is pulled from

  • prefix (string) – signifies that the value for new_property is a list and the prefix indicates which value to pick from the list

remap_node_property(type: str, old_property: str, new_property: str) → None

Remap the value in node old_property attribute with value from node new_property attribute.

Parameters
  • type (string) – label referring to nodes whose property needs to be remapped

  • old_property (string) – old property name whose value needs to be replaced

  • new_property (string) – new property name from which the value is pulled from

report() → None

Print a summary report about self.graph

static restore(data: Dict) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a dictionary.

Parameters

data (dict) – Dictionary containing nodes and edges

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

static restore_from_file(filename) → networkx.classes.multidigraph.MultiDiGraph

Deserialize a networkx.MultiDiGraph from a JSON file.

Parameters

filename (str) – File to read from

Returns

A networkx.MultiDiGraph representation

Return type

networkx.MultiDiGraph

set_filter(key: str, value: Union[List[str], str]) → None

Set a filter, defined by a key and value pair. These filters are used to reduce the search space.

Parameters
  • key (str) – The key for a filter

  • value (Union[List[str], str]) – The value for a filter. Can be either a string or a list

static validate_edge(edge: dict) → dict

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters

edge (dict) – An edge represented as a dict

Returns

An edge represented as a dict, with default assumptions applied.

Return type

dict

static validate_node(node: dict) → dict

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters

node (dict) – A node represented as a dict

Returns

A node represented as a dict, with default assumptions applied.

Return type

dict