πͺAI metadata enrichment
Interoperability: AI-based automatic metadata enrichment
In short:
The objective of this building block will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.
Timeline
Start date: T0 + 12 months (T0 = expected: Q1 2023)
End date : T0 + 36 months
Duration (in months): 24
Where we are now
Inokufu has trained three prototype ML-models. They are available on huggingface here.
The work will start Q1 of 2024. Want to learn more and join the effort: join here!
Objectives and Expected Outcomes
The objective of this building block is to improve the quality of educational resource, instructional activity, and training datasets by adding or correcting associated competency data (including job and/or domain).
There are currently several standards for the expression of metadata associated with educational resources, pedagogical activities and training (e.g. LHEO, LOM, etc) as well as several competency frameworks (e.g. ROME code, formacode, ESCO, EQF, ECF, etc). The competency data in these datasets are not always well identified.
This greatly limits the usefulness of these datasets, especially when these catalogs are used in search engines, which leads to results that are not considered relevant by users (e.g. MyTrainingCount).
The objective of this common will be to develop a machine learning classification model capable of identifying the skills, professions or training areas associated with an educational resource, an educational activity or a training course.
This will make datasets more interoperable for use in use cases related to jurisdictional data, combining datasets, and conducting impact studies.
Scope
Identification and inventory of data sets that can be used to train the model
Development of automatic natural language processing (NLP) strategies based on existing models (BERT, OpenAI, etc.) and/or re-training of the last layers
Training tests and optimization of the model parameters by machine learning according to the different strategies identified
Deployment of the best model(s) to make them easily deployable and usable in API
Data expression in JSON-LD format to ensure interoperability with other data spaces
API testing with model datasets provided by Prometheus volunteer partners (see support list)
Deployment of the service in a managed version in one of the partner cloud providers
Development of automated service deployment scripts for multi-cloud use (infrastructure as code e.g. Terraform) at partner cloud providers
Drafting of public documentation, hosting and putting it online
Deliverables
# | Availability | Deliverable |
3.2.1 | T0 + 16 | Documents : Specification and state of the art |
3.2.2 | T0 + 24 | Development of the service in beta version (v0) |
3.2.3 | T0 + 32 | QA test report |
3.2.4 | T0 + 36 | Final version (v1) of the service and public documentation |
Last updated