AI metadata enrichment
Interoperability: AI-based automatic metadata enrichment
- The objective of this building block will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.
Start date: T0 + 12 months (T0 = expected: Q1 2023)
End date : T0 + 36 months
Duration (in months): 24
The objective of this building block is to improve the quality of educational resource, instructional activity, and training datasets by adding or correcting associated competency data (including job and/or domain).
There are currently several standards for the expression of metadata associated with educational resources, pedagogical activities and training (e.g. LHEO, LOM, etc) as well as several competency frameworks (e.g. ROME code, formacode, ESCO, EQF, ECF, etc). The competency data in these datasets are not always well identified.
This greatly limits the usefulness of these datasets, especially when these catalogs are used in search engines, which leads to results that are not considered relevant by users (e.g. MyTrainingCount).
The objective of this common will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.
This will make datasets more interoperable for use in use cases related to jurisdictional data, combining datasets, and conducting impact studies.
- Identification and inventory of data sets that can be used to train the model
- Development of automatic natural language processing (NLP) strategies based on existing models (BERT, OpenAI, etc.) and/or re-training of the last layers
- Training tests and optimization of the model parameters by machine learning according to the different strategies identified
- Deployment of the best model(s) to make them easily deployable and usable in API
- Data expression in JSON-LD format to ensure interoperability with other data spaces
- API testing with model datasets provided by Prometheus volunteer partners (see support list)
- Deployment of the service in a managed version in one of the partner cloud providers
- Development of automated service deployment scripts for multi-cloud use (infrastructure as code e.g. Terraform) at partner cloud providers
- Drafting of public documentation, hosting and putting it online