Ranking Generation

The c_clause.RankingHandler can be used to create full rankings in the context of knowledge graph completion. Different from the other handlers, rule application is not performed on the fly. Instead, The target KG has to be loaded additionally into the loader with the target argument. The ranking is created for the target KG by grounding the rules on the data argument of the loader.

A complete ranking for the target KG is defined as follows. For each triple (head, relation, tail) from target, two queries are formed (head, relation, ?) and (?, relation, tail). The ranking contains for each of the queries a ranked list of candidate proposals sorted according to a heuristic based on the aggregation function specified.

First we define data and rules.

from c_clause import RankingHandler, Loader
from clause import Options

train = [
    ("marta", "bornIn", "rome"),
    ("italy", "hasCapital", "rome"),
    ("bernd", "speaks", "french"),
    ("marta", "speaks", "english"),
    ("marta", "speaks", "italian"),
    ("bernd", "teaches", "english"),
    ("enrico", "bornIn", "rome"),
]

valid = [
    ("english", "languageOf", "england"),
    ("australia", "hasCapital", "canberra"),
    ("enrico", "citizenOf", "italy")
]

test = [
    ("marta", "citizenOf", "italy"),
    ("bernd", "teaches", "french")
]

rules = [
    "citizenOf(X,Y) <= bornIn(X,A), hasCapital(Y,A)",
    "teaches(X,french) <= teaches(X, english)",
]

stats = [
    [20, 5],
    [21, 1],
]

Loading Data and Rules

For calculating rankings all three arguments of the load_data(data, filter, target) function of the loader have to be used. The loader can load a filter set (commonly the valid split of a KG). If it is not empty proposed candidates will be filtered always with this filter set. In our example, enrico will not be provided as answer to the query (?, citizenOf, italy) even if it is predicted by one or more rules. Also see the other filter options for data and target in the config-default.yaml .

opts = Options()
loader = Loader(opts.get("loader"))
# set filter to "" if not required
loader.load_data(data=data, fiter=valid, target=test)
loader.load_rules(rules=rules, stats=stats)

Calculating and Retrieving Rankings

The ranking handler calculates a ranking with the RankingHandler.calculate_ranking(loader) function for the KG specified with target while rules are applied on data. The results are cached until the function is invoked again.

ranker = RankingHandler(opts.get("ranking_handler"))
ranker.calculate_ranking(loader=loader)

The ranking can be retrieved in Python which is separated in the two query directions.

head_ranking = ranker.get_ranking(direction="head", as_string=False)
tail_ranking = ranker.get_ranking(direction="tail", as_string=True)

Where head_ranking is a dict and head_ranking[i][j] corresponds to query (?, i, j). Note that the relations denote the first key of the dict, s.t. relation-wise rankings can be retrieved easily. head_ranking[i][j] returns a sorted list of of tuples (cand, score) with head candidate proposals for the query. The explanations for the tail direction are identical and the dicts are always accessed with [rel][source-entitiy].

The complete ranking can also be written to a file. The output format is the same as the AnyBURL ranking files. This function only supports string outputs.

ranker.write_ranking(path=out, loader=loader)

Retrieving Rule Features

The ranker can also cache and output, for each candidate of every query, the rules that predicted the candidate. For this the option "ranking_handler.collect_rules" must be set to True (default: False) before the ranking is calculated.

# obtain rule features for every query
head_rules = ranker.get_rules(direction="head", as_string=True)
tail_rules = ranker.get_rules(direction="tail", as_string=True)

# write rule features
ranker.write_rules(path="rule-feats_head.txt", loader=loader, direction="head", as_string=False)
ranker.write_rules(path="rule-feats_tail.txt", loader=loader, direction="tail", as_string=False)


from clause.util.utils import read_jsonl
# list of dicts
read_jsonl("rule-feats_tail.txt")

Here, head_rules[rel][source] returns a dict for query (?, rel, source) and head_rules[rel][source][cand] returns the sorted list of predicting rules.

The output files are in jsonl format. Each line contains a json:

{ "query": [rel, source ], "answers": [ list of candidates], "rules": [[rules_cand_0], [rules_cand_1],...]}

However, note that the list of candidates is not sorted, e.g., it does not match the ordering of head_ranking[rel][source] from above.