Neutopia is an online education and publishing platform that invites a new type of learning experience. It caters to organizations who promote their resources/courses and learners who explore the subjects that interest them. The user is enabled to find the most relevant contents and aggregate playlists of their favourite educational materials organized into specific categories such as articles, books, events, courses, videos, websites as well as people with expertise in the fields they are into. The materials can be gathered from different sources, tagged, rated, reviewed and shared. The user can also create and manage their own influencer profile and add comments and ratings to other profiles. The contents can be added manually or embeded from YouTube, Vimeo, Amazon, SlideShare, MeetUp, EventBrite, Facebook and other integrated web services. The platform was built by Selleo with Ruby on Rails and Angular.
SoftKraft was hired to provide specialized services and enhance the existing platform with functionality in the areas of machine learning-enabled recommendation systems, automatic content classification and user experience personalization, incl. personalized content recommendations. The key technologies with which to implement the functionalities were to be Apache Spark, MLlib and the MongoDB database.
In the preliminary project phase we became familiar with the client business domain as well as identified the sets of data available in the existing system, with which to feed the recommendations engine. Since at the time the volume of data needed was insufficient to develop an automatic content classification mechanism, we decided to augment the data volume with a Twitter stream; a data mining tool built for this purpose used a Twitter stream API to make up for the missing data.
In the implementation phase, we relied on the data mining tool to expand the available data set as well as chose / crafted and fine-tuned a machine learning algorithm best suited to conduct content classification based on the data available. The measurement of the accuracy of the machine learning model provided feedback which facilitated further refinement of the model.
As regards the technologies used, the service built is based on the Spring Boot framework, which exposes a secure REST API and enables the scheduling of ML related tasks (e.g model retraining). We also used Apache Spark to ensure the option of horizontal scaling and thus fit the system up for supporting significant data volumes in the future. It is also in preparation for high loads that we built the platform using an asynchronous approach. The recommendation engine was developed with the collaborative filtering technique following the associated Spark documentation recommendations; still we adjusted the solution with some improvements to make it better fit the specific context of the project.