Machine learning and Very Large Data Sets

The international scientific conference Machine Learning and Very Large Data Sets took place at the Yandex School of Data Analysis from September 29 through October 2. Participants discussed current issues concerning machine learning and algorithms for processing large volumes of data.

Leading specialists from Britain, Germany, Russia, the United States and France took part in the conference, delivering more than 20 speeches and lectures and participating in five discussion sessions. Speakers included internationally renowned researchers, Yandex staff, students and graduates of the School of Data Analysis. Many issues raised by the speakers aroused great interest among the audience, evidenced in the discussions that continued past their scheduled finishing time. Topics included not only methods of machine learning but also fundamental problems in the theory of knowledge, which was previously confined to the discipline of philosophy. As conference organisers, we pursued several goals:

  • to focus the attention of the scientific community on the deep interrelationship between three areas in high technology: machine learning, the analysis of large volumes of information, and the development of an intellectual internet;
  • to discuss key ideas of these three areas and the questions that arise in their development: the fundamental capabilities of machines, intellectual collaboration between humans and machines, and the division of tasks between them;
  • to think together about whether it is possible to create a machine that can communicate with us in a human language, and highlight the components of this problem that must and can be solved quickly;
  • to find the answer to the question why, for the analysis of large volumes of data, it is necessary to be able to detect hidden regularities, having a very small number of manifestations.
  • Conference participants expressed their thoughts and ideas on which direction to move forward, what to look for and develop. We were able to attract world-class specialists and acquainted them more closely with what Yandex is doing. This gives impetus to the development of educational programs for the school, and widens our developers’ collaboration with their foreign counterparts.

Conference participants were also taken on a fascinating excursion to the dacha-museum of the great mathematician Andrey Nikolaevich Kolmogorov, organised by one of his former students, Albert Nikolaevich Shiryaev.

Show more

Conference program

Day 1 (September 27)
Main program
Conference opening ceremony
Arkady Volozh
Yandex CEO, Russia
Measures of Complexity in the Theory of Machine Learning
Alexey Chervonenkis
Russia, UK
Combinatorial Theory of Overfitting
Konstantin Vorontsov
Validity and Efficiency of Set Predictors
Vladimir Vovk
Conformal Prediction and Its Applications
Alexander Gammerman
Learning Hierarchies of Invariant Features
Yann LeCun
Short talks
Algorithmic sufficient statistics
Nikolay Vereshchagin
Model assessment and selection for human activity recognition
Mikhail Pikhletsky
Ilia Safonov
Oleg Tishutin
Marc James Ashton Bailey
Active learning to rank
Vladimir Gulin
Day 2 (September 28)
Main program
Counterfactual Reasoning and Computational Advertisement
Leon Bottou
Inference of Causal Direction and Its Application to Machine Learning
Bernhard Schölkopf
YT – Distributed Data Processing Revisited
Maxim Babenko
Mergeable Summaries
tutorial, Graham Cormode
Sketch Data Structures and Concentration Bounds
Graham Cormode
Short talks
Inverted multiindex for large-scale nearest neighbor search
Artem Babenko
YT infrastructure application for oil and gas exploration industry
Maxim Ryabinsky
Yandex Data Factory: friendly machine learning pipeline
Oleg Yuhno
Day 4 (September 30)
Main program
Explaining AdaBoost
Robert Schapire
MatrixNet is Yandex's implementation of Gradient Boosted Decision Tree algorithm (GBRT)
Andrey Gulin
Optimal Stopping Rules and Their Applications
Albert Shiryaev
Sequential Dedetection/Isolation of Abrupt Changes with Some Applications
Igor Nikiforov
Filling Algorithm for Weighted Graph
Evgeny Shchepin
Short talks
Review of trend models in time-series analysis
Evgeny Burnaev
Detection of change-points in polynomial trend of time-series with seasonal components
Evgeny Burnaev
Andrey Lokot
Quality-biased Ranking for Queries with Commercial Intent
Alexander Shishkin
Polina Zhinalieva
Kirill Nikolaev
Graph-based Malware Distributors Detection
Andrei Venzhega
Ad relevance prediction
Sergey Kacher
Day 5 (October 1)
Main program
Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications
Mikhail Roytberg
Skeptical, Scalable Topic Models: Models for Interactive Exploration of Diverse Data
Jordan Boyd-Graber
Combining Machine Learning with Formal Linguistic Semantics
Kathleen Dahlgren
Following the Thread of Dialogue with Linguistic Semantics
Kathleen Dahlgren
Using Hierarchical Ontologies for Interpretation of Activities and Texts
Boris Mirkin
Russia, UK
Short talks
Studying structure and evolution of search engine query flow
Alexander Kukushkin
Alexey Tikhonov
Yandex computer vision technologies
Anton Slesarev
Developing user-behavior based algorithm for detecting adult videos
Boris Okun
Short text classification
Alexei Dral
Emeli Mbaykodzhi
Three-dimensional reconstruction using cluster analysis
Sergey Arkhangelsky
Day 6 (October 2)
Main program
Web-Graph Models and Applications
tutorial, Andrey Raigorodsky
Critical Sample Size for Bayesian Inference in Gaussian Process Regression
Evgeny Burnaev
Alexey Zaytsev
Vladimir Spokoiny
Regularization, Sparsity, and Logistic Regression
Alexander Genkin
Convex Optimization and Structural Clustering
Evgeny Bauman
Short talks
Text-based Online Advertising: L1 Regularization for Feature Selection
Ilya Trofimov
Use of browser toolbar logs in web search
Gleb Gusev
Nested processes on a hyper-link graph
Maxim Zhukovski
Timely crawling of high-quality ephemeral new content
Ludmila Ostroumova


Maxim Babenko (Russia)
Evgeny Bauman (USA)
Leon Bottou (USA)
Jordan Boyd-Graber (USA)
Alexey Chervonenkis (Russia, UK)
Graham Cormode (USA)
Kathleen Dahlgren (USA)
Alexander Gammerman (UK)
Alexander Genkin (USA)
Andrey Gulin (Russia)
Yann LeCun (USA)
Boris Mirkin (Russia, UK)
Igor Nikiforov (France)
Andrey Raygorodsky (Russia)
Mikhail Roytberg (Russia)
Robert Schapire (USA)
Bernhard Schölkopf (Germany)
Evgeny Shchepin (Russia)
Albert Shiryaev (Russia)
Sargur Srihari (USA)
Konstantin Vorontsov (Russia)
Vladimir Vovk (UK)

Program committee

Maxim Babenko
Victor Buchshtaber
Alexey Chervonenkis
Andrey Raigorodsky
Ilya Segalovich
Albert Shiryaev
Konstantin Vorontsov

Organising committee

Artem Babenko
Elena Bunina
Ilya Muchnik
Elena Shiryaeva
Arkady Volozh