πŸͺ„AI metadata enrichment

Interoperability: AI-based automatic metadata enrichment

In short:

  • The objective of this building block will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.

Timeline

Start date: T0 + 12 months (T0 = expected: Q1 2023)

End date : T0 + 36 months

Duration (in months): 24

Where we are now

  • Inokufu has trained three prototype ML-models. They are available on huggingface here.

  • The work will start Q1 of 2024. Want to learn more and join the effort: join here!

Objectives and Expected Outcomes

The objective of this building block is to improve the quality of educational resource, instructional activity, and training datasets by adding or correcting associated competency data (including job and/or domain).

There are currently several standards for the expression of metadata associated with educational resources, pedagogical activities and training (e.g. LHEO, LOM, etc) as well as several competency frameworks (e.g. ROME code, formacode, ESCO, EQF, ECF, etc). The competency data in these datasets are not always well identified.

This greatly limits the usefulness of these datasets, especially when these catalogs are used in search engines, which leads to results that are not considered relevant by users (e.g. MyTrainingCount).

The objective of this common will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.

This will make datasets more interoperable for use in use cases related to jurisdictional data, combining datasets, and conducting impact studies.

Scope

  • Identification and inventory of data sets that can be used to train the model

  • Development of automatic natural language processing (NLP) strategies based on existing models (BERT, OpenAI, etc.) and/or re-training of the last layers

  • Training tests and optimization of the model parameters by machine learning according to the different strategies identified

  • Deployment of the best model(s) to make them easily deployable and usable in API

  • Data expression in JSON-LD format to ensure interoperability with other data spaces

  • API testing with model datasets provided by Prometheus volunteer partners (see support list)

  • Deployment of the service in a managed version in one of the partner cloud providers

  • Development of automated service deployment scripts for multi-cloud use (infrastructure as code e.g. Terraform) at partner cloud providers

  • Drafting of public documentation, hosting and putting it online

Deliverables

#

Availability

Deliverable

3.2.1

T0 + 16

Documents : Specification and state of the art

3.2.2

T0 + 24

Development of the service in beta version (v0)

3.2.3

T0 + 32

QA test report

3.2.4

T0 + 36

Final version (v1) of the service and public documentation

Last updated