Rule Materialization

The c_clause.RulesHandler can be used to calculate the materialization of given input rules and to calculate their statistics, that is, the support and the number of predictions (body groundings in data). Note that the rules do not need to be loaded with the loader but are directly passed as handler input. The loader only needs to load data. The materialization will be based on the data argument of the loader.

from c_clause import Loader
from clause import Options

data = [
    ("anna", "livesIn", "london"),
    ("anna", "learns", "english"),
    ("bernd", "speaks", "french")
]

opts = Options()
loader = Loader(opts.get("loader"))
loader.load_data(data=data)

Materialize a rule set

First, the handler option rules_handler.collect_predictions needs to be turned on (default: True) and the handler is created. Other handler options can be found in the config-default.yaml.

Note

Calculating the materialization of large rule sets, especially for cyclical rules, can be memory and runtime expensive. If you only want to calculate statistics (see next section), turn off rules_handler.collect_predictions for reducing memory footprint and ensure that all threads are used (default: all).

from c_clause import Loader, RulesHandler
from clause import Options
opts.set("rules_handler.collect_predictions", True)
rh = RulesHandler(options=opts.get("rules_handler"))

Materialization is performed with RulesHandler.calculate_predictions(rules, loader). The calculations are based on the data argument of the loader. The argument rules is either a list with rule strings or a file path. If a file path is specified the file can either contain a rule string on every line or each line is tab separated as the standard rule file syntax containing statistics. In the latter case, the statistics do not have any effect.

# input from python
rules = [
    "speaks(X,Y) <= learns(X,Y)",
    "speaks(X,english) <= livesIn(X,london)",
    "speaks(X,english) <= speaks(X,A)"
]
# input from file
# each line contains (no spaces)
# either 'num_preds\t support\t conf\t rulestring' or just 'rulestring'
rules = "my-rules.txt"

rh.calculate_predictions(rules=rules, loader=loader)

Obtaining the outputs

Outputs can be retrieved directly:

# obtain directly as string
preds_str = rh.get_predictions(as_string=True)
 # obtain directly as idx
preds_idx = rh.get_predictions(as_string=False)

Here preds_str[i] and preds_idx[i] are a list containing the materialized triples for the i’th input rule.

Outputs can also be written to a file:

# write to file as flat KG
# the file is a standard tab separated file containing triples
# duplicates are removed, can be directly loaded as an input set with the loader
rh.write_predictions(path="mat.txt", flat=True, as_string=True)

# file is in jsonl format, each line can be read and dumped with Python json module
# each dict contains as key the rule string and as value the materialized triples
rh.write_predictions(path="mat.txt", flat=False, as_string=True)