Concept Learning
This is a guide to show how to use a concept learner to generate hypotheses for a target concept in an ontology. In this guide we will show how to use the following concept learners of Ontolearn library:
The other concept learners are not covered here in details, but we have provided examples for them. Check the jupyter notebook files as well as other example scripts for the corresponding learner inside the examples folder (direct links are given at the end of this guide).
It is worth mentioning that NCES2 and NERO are not yet implemented in Ontolearn, but they will be soon.
Expressiveness
Evolearner → ALCQ(D).
DRILL → ALC
NCES → ALC
NCES2 → ALCHIQ(D)
NERO → ALC
CLIP → ALC
CELOE and OCEL → ALC
The three algorithms that we mentioned in the beginning are similar in execution, for that reason, we are describing them in a general manner. To test them separately see Quick try-out. Each algorithm may have different available configuration. However, at minimum, they require a knowledge base and a learning problem.
Let’s see the prerequisites needed to run the concept learners:
Prerequisites
Before configuring and running an algorithm, we recommend you to store the dataset path
that ends with .owl
and the IRIs as string of the learning problem instances in a json file as shown below.
The learning problem is further divided in positive and negative examples. We have saved ourselves some
hardcoded lines which we can now simply access by loading the json file. Below is
an example file that we are naming synthetic_problems.json
showing how should it look:
{
"data_path": "../KGs/Family/family-benchmark_rich_background.owl",
"learning_problem": {
"positive_examples": [
"http://www.benchmark.org/family#F2F28",
"http://www.benchmark.org/family#F2F36",
"http://www.benchmark.org/family#F3F52"
],
"negative_examples": [
"http://www.benchmark.org/family#F6M69",
"http://www.benchmark.org/family#F6M100",
"http://www.benchmark.org/family#F2F30"
]
}
}
We are considering that you are trying this script inside examples
folder, and
therefore we have stored the ontology path like that.
Note: The KGs directory contains datasets, and it’s not part of the project. They have to be downloaded first, see Download External Files. You can also download some ready to use learning problem json files by clicking here.
Configuring Input Parameters
Before starting with the configuration you can enable logging to see the logs which give insights about the main processes of the algorithm:
from ontolearn.utils import setup_logging
setup_logging()
We then start by loading the synthetic_problems.json
where we have
stored the knowledge base path and the learning problems in the variable settings
:
import json
with open('synthetic_problems.json') as json_file:
settings = json.load(json_file)
Load the ontology
Load the ontology by simply creating an instance of the class
KnowledgeBase
and passing the ontology path stored
under data_path
property of settings
:
from ontolearn.knowledge_base import KnowledgeBase
kb = KnowledgeBase(path=settings['data_path'])
Configure the Learning Problem
The Structured Machine Learning implemented in our Ontolearn library
is working with a type of supervised
learning. One of
the first things to do after loading the Ontology to a KnowledgeBase
object
is thus to define the learning problem for which the
learning algorithm is trying to generate hypothesis (class expressions).
First and foremost, load the learning problem examples from the json file into sets as shown below:
positive_examples = set(settings['learning_problem']['positive_examples'])
negative_examples = set(settings['learning_problem']['negative_examples'])
In Ontolearn you represent the learning problem as an object of the class
PosNegLPStandard
which has two parameters pos
and neg
respectively
for the positive and negative examples. These parameters are of
type set[OWLNamedIndividual]
. We create these sets by mapping
each individual (stored as string
) from the set positive_examples
and negative_examples
to OWLNamedIndividual
:
from ontolearn.learning_problem import PosNegLPStandard
from owlapy.owl_individual import IRI, OWLNamedIndividual
typed_pos = set(map(OWLNamedIndividual, map(IRI.create, positive_examples)))
typed_neg = set(map(OWLNamedIndividual, map(IRI.create, negative_examples)))
lp = PosNegLPStandard(pos=typed_pos, neg=typed_neg)
To construct an OWLNamedIndividual
object an IRI is required as an input.
You can simply create an IRI
object by calling the static method create
and passing the IRI as a string
.
Configuring & Executing a Concept Learner
To learn class expressions we need to build a model of the concept learner that we want to use. It can be either EvoLearner, CELOE or OCEL. Depending on the algorithm you chose there are different initialization parameters which you can check here. Let’s start by setting a quality function.
Quality metrics
There is a default quality function to evaluate the quality of the found expressions but different concept learners have different default quality function. Therefore, you may want to set it explicitly. There are the following quality function:F1 Score, Predictive Accuracy, Precision, and Recall. To use a quality function, first create an instance of its class:
from ontolearn.metrics import Accuracy
pred_acc = Accuracy()
In the following example we have built a model of OCEL and we have specified some of the parameters which can be set for OCEL.
(Optional) If you have target concepts that you want to ignore check how to ignore concepts.
Create a model
from ontolearn.concept_learner import OCEL
model = OCEL(knowledge_base=kb,
quality_func = pred_acc,
max_runtime=600,
max_num_of_concepts_tested=10_000_000_000,
iter_bound=10_000_000_000)
The parameter knowledge_base
which is the only required parameter, specifies the
knowledge base that is used to learn and test concepts.
The following parameters are optional.
quality_func
- function to evaluate the quality of solution concepts. (Default value = F1())max_runtime
- runtime limit in seconds. (Default value = 5)max_num_of_concepts_tested
- limit to stop the algorithm after n concepts tested. (Default value = 10_000)iter_bound
- limit to stop the algorithm after n refinement steps are done. (Default value = 10_000)
Execute and fetch the results
After creating the model you can fit the learning problem
into this model, and it will find
the hypotheses that explain the positive and negative examples.
You can do that by calling the method fit
:
model.fit(lp)
The hypotheses can be saved:
model.save_best_hypothesis(n=3, path='Predictions')
save_best_hypothesis
method creates a .owl
file of the RDF/XML format
containing the generated (learned) hypotheses.
The number of hypotheses is specified by the parameter n
.
path
parameter specifies the name of the file.
If you want to print the hypotheses you can use the method best_hypotheses
which will return the n
best hypotheses together with some insights such
as quality value, length, tree length and tree depth of
the hypotheses, and the number of individuals that each of them is covering, use
the method best_hypotheses
where n
is the number of hypotheses you want to return.
hypotheses = model.best_hypotheses(n=3)
[print(hypothesis) for hypothesis in hypotheses]
You can also create a binary classification for the specified individuals by using the
predict
method as below:
binary_classification = model.predict(individuals=list(typed_pos | typed_neg), hypotheses=hypotheses)
Here we are classifying the positives and negatives individuals using the generated hypotheses. This will return a data frame where 1 means True and 0 means False.
Verbalization
You can as well verbalize or visualize the generated hypotheses into images by using the
static method verbalize
. This functionality requires an external package which
is not part of the required packages for Ontolearn as well as graphviz.
Install deeponto.
pip install deeponto
+ further requirements like JDK, etc. Check https://krr-oxford.github.io/DeepOnto/ for full instructions.Install graphviz at https://graphviz.org/download/.
After you are done with that you can simply verbalize predictions:
model.verbalize('Predictions.owl')
This will create for each class expression inside Predictions.owl
a .png
image that contain the tree representation of that class expression.
Use Triplestore Knowledge Base
Instead of going through nodes using expensive computation resources why not just make use of the efficient approach of querying a triplestore using SPARQL queries. We have brought this functionality to Ontolearn for our learning algorithms, and we take care of the conversion part behind the scene. Let’s see what it takes to make use of it.
First of all you need a server which should host the triplestore for your ontology. If you don’t already have one, see Loading and Launching a Triplestore below.
Now you can simply initialize a TripleStoreKnowledgeBase
object that will server as an input for your desired
concept learner as follows:
from ontolearn.triple_store import TripleStoreKnowledgeBase
kb = TripleStoreKnowledgeBase("http://your_domain/some_path/sparql")
Notice that the triplestore endpoint is the only argument that you need to pass. Also keep in mind that this knowledge base contains a TripleStoreOntology and TripleStoreReasoner which means that every querying process concerning concept learning is now using the triplestore.
Important notice: The performance of a concept learner may differentiate when using triplestore. This happens because some SPARQL queries may not yield the exact same results as the local querying methods.
Loading and Launching a Triplestore
We will provide a simple approach to load and launch a triplestore in a local server. For this, we will be using apache-jena and apache-jena-fuseki. As a prerequisite you need JDK 11 or higher and if you are on Windows, you need Cygwin. In case of issues or any further reference please visit the official page of Apache Jena and check the documentation under “Triple Store”.
Having that said, let us now load and launch a triplestore on the “Father” ontology:
Open a terminal window and make sure you are in the root directory. Create a directory to store the files for Fuseki server:
mkdir Fuseki && cd Fuseki
Install apache-jena and apache-jena-fuseki. We will use version 4.7.0.
# install Jena
wget https://archive.apache.org/dist/jena/binaries/apache-jena-4.7.0.tar.gz
#install Jena-Fuseki
wget https://archive.apache.org/dist/jena/binaries/apache-jena-fuseki-4.7.0.tar.gz
Unzip the files:
tar -xzf apache-jena-fuseki-4.7.0.tar.gz
tar -xzf apache-jena-4.7.0.tar.gz
Make a directory for our ‘father’ database inside jena-fuseki:
mkdir -p apache-jena-fuseki-4.7.0/databases/father/
Now just load the ‘father’ ontology using the following commands:
cd ..
Fuseki/apache-jena-4.7.0/bin/tdb2.tdbloader --loader=parallel --loc Fuseki/apache-jena-fuseki-4.7.0/databases/father/ KGs/Family/father.owl
Launch the server, and it will be waiting eagerly for your queries.
cd Fuseki/apache-jena-fuseki-4.7.0
java -Xmx4G -jar fuseki-server.jar --tdb2 --loc=databases/father /father
Notice that we launched the database found in Fuseki/apache-jena-fuseki-4.7.0/databases/father
to the path /father
.
By default, jena-fuseki runs on port 3030 so the full URL would be: http://localhost:3030/father
. When
you pass this url to triplestore_address
argument, you have to add the
/sparql
sub-path indicating to the server that we are querying via SPARQL queries. Full path now should look like:
http://localhost:3030/father/sparql
.
You can now create a triplestore knowledge base or a reasoner that uses this URL for their operations: TODO