# AI metadata enrichment

### In short:

* The objective of this building block will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.

### Timeline

Start date: T0 + 12 months (T0 = expected: Q1 2023)

End date : T0 + 36 months

Duration (in months): 24

#### Where we are now

* Inokufu has trained three prototype ML-models. They are available on huggingface [here](https://huggingface.co/inokufu).&#x20;
* The work will start Q1 of 2024. Want to learn more and join the effort: join [here](/fundamentals/working-groups.md)!

### Objectives and Expected Outcomes

The objective of this building block is to improve the quality of educational resource, instructional activity, and training datasets by adding or correcting associated competency data (including job and/or domain).

There are currently several standards for the expression of metadata associated with educational resources, pedagogical activities and training (e.g. LHEO, LOM, etc) as well as several competency frameworks (e.g. ROME code, formacode, ESCO, EQF, ECF, etc). The competency data in these datasets are not always well identified.

This greatly limits the usefulness of these datasets, especially when these catalogs are used in search engines, which leads to results that are not considered relevant by users (e.g. MyTrainingCount).

The objective of this common will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.

This will make datasets more interoperable for use in use cases related to jurisdictional data, combining datasets, and conducting impact studies.

### Scope&#x20;

* Identification and inventory of data sets that can be used to train the model
* Development of automatic natural language processing (NLP) strategies based on existing models (BERT, OpenAI, etc.) and/or re-training of the last layers
* Training tests and optimization of the model parameters by machine learning according to the different strategies identified
* Deployment of the best model(s) to make them easily deployable and usable in API
* Data expression in JSON-LD format to ensure interoperability with other data spaces
* API testing with model datasets provided by Prometheus volunteer partners (see support list)
* Deployment of the service in a managed version in one of the partner cloud providers
* Development of automated service deployment scripts for multi-cloud use (infrastructure as code e.g. Terraform) at partner cloud providers
* Drafting of public documentation, hosting and putting it online

### Deliverables

| #     | Availability | Deliverable                                                |
| ----- | ------------ | ---------------------------------------------------------- |
| 3.2.1 | T0 + 16      | Documents : Specification and state of the art             |
| 3.2.2 | T0 + 24      | Development of the service in beta version (v0)            |
| 3.2.3 | T0 + 32      | QA test report                                             |
| 3.2.4 | T0 + 36      | Final version (v1) of the service and public documentation |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dataspace.prometheus-x.org/building-blocks/interoperability/ai-metadata-enrichment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
