Query Answering

The c_clause.QAHandler can be used to answer head queries of the form (?, rel, source_entity) and tail queries (source_entity, rel, ?). It can also output for each candidate entity, the rules that predicted the candidate.

We assume in the following a loader is created and data and rules are loaded:

from c_clause import QAHandler, Loader
from clause import Options

data = [
    ("anna", "livesIn", "london"),
    ("anna", "learns", "english"),
    ("bernd", "speaks", "french")
]
rules = [
    "speaks(X,Y) <= learns(X,Y)",
    "speaks(X,english) <= livesIn(X,london)",
    "speaks(X,english) <= speaks(X,A)"
]
stats = [
    [20, 10],
    [40, 35],
    [50, 5],
]
opts = Options()
loader = Loader(opts.get("loader"))
loader.load_data(data=data)
loader.load_rules(rules=rules, stats=stats)

Calculating Candidate Answers

The handler is initialized with options and answers are calculated with c_clause.QAHandler.calculate_answers(..). The function takes the query input, the loader and one of two possible query directions “head” or “tail”.

qa = QAHandler(options=opts.get("qa_handler"))
queries = [("anna", "speaks"), ("bernd", "speaks")]
# direction == "head" or "tail"
qa.calculate_answers(queries=queries, loader=loader, direction="tail")

Query input types

1. From Python as strings: queries can be a list of string queries (tuples or list) containing the source entity at the first and the relation at the second position (see above)

From file as strings: queries can be a file path where every line contains a tab separated query, my-queries.txt:

anna    speaks
bernd   speaks

Note

Queries are always specified with the source entity first and subsequently with the relation. Even when direction is specified to be head.

3. From Python as idx’s: queries can be a list or 2d np.array of idx’s queries. This requires the user to know the relation and entity index. Either by retrieving them from the loader after data is loaded or by setting them before data is loaded. See also in the section Loading Data. The following example sets the entity and relation index before loading data and rules.

from c_clause import QAHandler, Loader
from clause import Options
import numpy as np

data = [
    ("anna", "livesIn", "london"),
    ("anna", "learns", "english"),
    ("bernd", "speaks", "french")
]
rules = [
    "speaks(X,Y) <= learns(X,Y)",
    "speaks(X,english) <= livesIn(X,london)",
    "speaks(X,english) <= speaks(X,A)"
]
stats = [
    [20, 10],
    [40, 35],
    [50, 5],
]

opts = Options()
loader = Loader(opts.get("loader"))

# 0:anna, 1:bernd 2:london ...
entity_index = ["anna", "bernd", "london", "english", "french"]
relation_index = ["speaks", "livesIn", "learns"]

# set index before loading data and rules
loader.set_entity_index(index=entity_index)
loader.set_relation_index(index=relation_index)

loader.load_data(data=data)
loader.load_rules(rules=rules, stats=stats)

queries = np.array([(0,0), (1,0)])
qa = QAHandler(options=opts.get("qa_handler"))
qa.calculate_answers(queries=queries, loader=loader, direction="tail")

Retrieving Results

The handler caches the results until the calculate_answers(..) function is invoked again. The QAHandler can output the calculated candidates and their aggregated scores, depending on the selected aggregation function with qa_handler.aggregation_function . It can also output, for each candidate answer, the rules that predicted the candidate.

Independent of how data was loaded and how queries were defined (strs or idx’s), outputs can be written to a file or obtained in Python and they can be formatted as idx’s or strings.

Outputting candidates and scores

...
qa.calculate_answers(queries=queries, loader=loader, direction="tail")

# output strings
answers_str = qa.get_answers(as_string=True)
# output idx's
answers_idx = qa.get_answers(as_string=False)
# write to file as string
qa.write_answers(path="tail-answers_str.jsonl", as_string=True)
# write to file as idx
qa.write_answers(path="tail-answers_idx.jsonl", as_string=False)

Here answers_str and answers_idx are lists where answers_str[i] returns an ordered list of tuples for query i. The tuples contain as first element the candidate entity (str or idx) and as second element the aggregated prediction score.

The files are of jsonl format (each line is a valid json) where each line corresponds to one query. They can be read line-wise and each line can be dumped with the Python json module.

Outputting predicting rules

If you want to output the predicting rules for every candidate you have to first set the qa_handler.collect_rules option to true.

...
opts.set("qa_handler.collect_rules", True)
qa.set_options(options=opts.get("qa_handler"))
qa.calculate_answers(queries=queries, loader=loader, direction="tail")

# output strings
rules_str = qa.get_rules(as_string=True)
# output idx's
rules_idx = qa.get_rules(as_string=False)
# write to file as string
qa.write_rules(path="tail-answers_str.jsonl", as_string=True)
# write to file as idx
qa.write_rules(path="tail-answers_idx.jsonl", as_string=False)

Here rules_str and rules_idx are lists of lists where rules_str[i][j][k] returns the k’th rule (idx or str, sorted according to confidences) for the j’th candidate answer of the i’th query. Note that the lengths of the candidate and rule lists are not identical.

When the rule are returned with their idx, the mapping can be retrieved from the loader with loader.rule_index().

The files are in jsonl format and can be read as described above.

Filtering

The QAHandler can be configured with various options as described in the config-default.yaml . The option qa_handler.filter_w_data will suppress any candidate that forms a true answer in the base dataset data of the loader. If the loader additionally uses the filter argument while loading, this will likewise be used for filtering candidates.

...
# define data1, data2, etc..
...

opts = Options()
loader = Loader(opts.get("loader"))
# filters with data2 automatically when "filter" is specified
loader.load_data(data=data1, filter=data2)
...
# filter with data1 when qa_handler option is activated; default: True
opts.set("qa_handler.filter_w_data", True)