BECS 2022 – 2nd International Workshop on Big data driven Edge Cloud Services


Web Engineering with Human-in-the-Loop

Dmitry Ustalov, Nikita Pavlichenko, Boris Tseytlin, Daria Baidakova and Alexey Drutsa


Modern Web applications employ sophisticated Machine Learning models to rank news, posts, products, and other items presented to the users or contributed by them. To keep these models useful, one has to constantly train, evaluate, and monitor these models using freshly annotated data, which can be done using crowdsourcing. In this tutorial we will present a portion of our six-year experience in solving real-world tasks with human-in-the-loop pipelines that combine efforts made by humans and machines. We will introduce data labeling via public crowdsourcing marketplaces and present the critical components of efficient data labeling. Then, we will run a practical session, where participants address a challenging real-world Information Retrieval for e-Commerce task, experiment with selecting settings for the labeling process, and launch their label collection project on real crowds within the tutorial session. We will present useful quality control techniques and provide the attendees with an opportunity to discuss their annotation ideas. Methods and techniques described in this tutorial can be applied to any crowdsourced data and are not bound to any specific crowdsourcing platform.