Learning ML

 · 3 min read
 · Konstantinos Tsoumas
Table of contents

The journey of learning machine learning is often initiated with well-known datasets, such as the Titanic survival dataset. One of the biggest challenges of teaching (or learning) machine learning lies in its applications though. Due to the abundance of solutions out there, oftentimes, ML practitioners don’t find themselves challenged enough throughout their problem solving journey. Indeed, this ease of finding solutions can also fill learners with a sense of achievement.

Think about it for a sec though, how am I supposed to learn when the task provided is way above my capabilities? Right. As you know by now, Internet is full of blogs, articles, youtube videos and courses on how to learn ML. Lot's of these resources can be used even by a beginner to implement a full image recognition model. That's hugely motivational, hands off. However, it might turn into a blocker when advancing into a field.

A real world problem example in which students can unleash the inner creativity engine might benefit students a lot. We all know this (even for ourselves) but we tend to always find something that is either super difficult to solve (complex, needs a lot of time, goes beyond our knowledge pack) or it's not aligned with our interests. The whole trade-off lies in finding a balance between complexity and interest avoiding projects that are either too daunting or misaligned with personal interests.

Experimenting in Applied Learning

The recent research by Muller et al. (2022) investigated teaching ML to molecular biologists concluding that the choosing the 'right' dataset is important. As ML is a, by nature, interdisciplinary field, Dogan (2023) explored the impact of real research questions on the learning process for graduate and undergraduate students taking an course on AI by providing them a real research questions usings datasets from university's different departments. The course's objective was to produce 'publishable' papers from the interdisciplinary project results which served as a link to research and helped them improve their writing skills. Students were responsible for the whole data science lifecycle - from cleaning and preparing data to algorithm application, monitoring various different ML algorithms and share the results in the research paper that they were about to write. Throughout this experience, the faculty mentor per group explained the problem statement and helped them with specific any needed domain knowledge. The instructor of the course also monitored and assisted every team in the data science lifecycle (cleaning, coding, testing, deploying). The provided code had to compile, run and produce output. Projects ranged from analysing and understanding the factors that lead to divers' balance in their dives using force plate data to more complex interdisciplinary studies.

How is this helpful?

The findings give us a glance, through the eyes of new students working on the field, on how it feels to learn ML again. That's a unique viewpoint. The most challenging topic for the students turned to be the data preparation phase as real world data have a lot of missing values, etc. Additionally, both student groups were introduced to the Housing Prediction dataset (another famous competition dataset) in class and therefore tried to reuse the code to solve the real world problem provided, but failed. They needed help writing code for the new real world scenario as it was difficult for them applying any AI algorithms. Only after the code was working did writing the paper seem easier than the rest of the tasks since they've been provided with sample papers and templates.

I'm by any means not saying that Academia is the only way to learn ML but there are quite some initiatives that are remarkable. Palazzo et al. (2022) collaborated with local authorities to obtain locally-based datasets and Kazmi, H. (2022) used end-to-end projects for energy engineers to foster a more engaged community.

The takeouts of this is to meticulously select interdisciplinary projects and invest time in understanding the data that is present.

  • Make sure you have experienced a world known problem (e.g., Titanic competition) in order to gain confidence working on different problems. Try to understand the difference between the popular ML problems and the real world problems out there. This will make you stand in the long run! :)

Happy solving.

For those interested in diving deeper, here are some valuable references:

Dogan, G. (2023, December). Teaching Machine Learning with Applied Interdisciplinary Real World Projects. In The Third Teaching Machine Learning and Artificial Intelligence Workshop (pp. 12-15). PMLR.

Müller, R., Fasemore, A. M., Elhossary, M., & Förstner, K. U. (2022, March). A lesson for teaching fundamental Machine Learning concepts and skills to molecular biologists. In Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop (pp. 68-72).

PMLR.Kazmi, H. (2022, March). Teaching machine learning through end-to-end decision making. In Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop (pp. 10-14). PMLR.

Palazzo, M., Velazquez, A., Breda, M., Callara, M., & Aguirre, N. (2022, March). Teaching machine learning in argentina: the clusterAI pipeline. In Proceedings of the Second Teaching Machine Learning and Artificial Intelligence Workshop (pp. 83-87). PMLR.