Conference

Machine learning and Very Large Data Sets

The international scientific conference Machine Learning and Very Large Data Sets took place at the Yandex School of Data Analysis from September 29 through October 2. Participants discussed current issues concerning machine learning and algorithms for processing large volumes of data.

Leading specialists from Britain, Germany, Russia, the United States and France took part in the conference, delivering more than 20 speeches and lectures and participating in five discussion sessions. Speakers included internationally renowned researchers, Yandex staff, students and graduates of the School of Data Analysis. Many issues raised by the speakers aroused great interest among the audience, evidenced in the discussions that continued past their scheduled finishing time. Topics included not only methods of machine learning but also fundamental problems in the theory of knowledge, which was previously confined to the discipline of philosophy. As conference organisers, we pursued several goals:

  • to focus the attention of the scientific community on the deep interrelationship between three areas in high technology: machine learning, the analysis of large volumes of information, and the development of an intellectual internet;
  • to discuss key ideas of these three areas and the questions that arise in their development: the fundamental capabilities of machines, intellectual collaboration between humans and machines, and the division of tasks between them;
  • to think together about whether it is possible to create a machine that can communicate with us in a human language, and highlight the components of this problem that must and can be solved quickly;
  • to find the answer to the question why, for the analysis of large volumes of data, it is necessary to be able to detect hidden regularities, having a very small number of manifestations.
  • Conference participants expressed their thoughts and ideas on which direction to move forward, what to look for and develop. We were able to attract world-class specialists and acquainted them more closely with what Yandex is doing. This gives impetus to the development of educational programs for the school, and widens our developers’ collaboration with their foreign counterparts.

Conference participants were also taken on a fascinating excursion to the dacha-museum of the great mathematician Andrey Nikolaevich Kolmogorov, organised by one of his former students, Albert Nikolaevich Shiryaev.

Show more

Conference program

Day 1 (September 27)
Main program
Conference opening ceremony
Arkady Volozh
Yandex CEO, Russia
Measures of Complexity in the Theory of Machine Learning
Alexey Chervonenkis
Russia, UK
Combinatorial Theory of Overfitting
Konstantin Vorontsov
Russia
Validity and Efficiency of Set Predictors
Vladimir Vovk
UK
Conformal Prediction and Its Applications
Alexander Gammerman
UK
Learning Hierarchies of Invariant Features
Yann LeCun
USA
Short talks
Algorithmic sufficient statistics
Nikolay Vereshchagin
Model assessment and selection for human activity recognition
Mikhail Pikhletsky
Ilia Safonov
Oleg Tishutin
Marc James Ashton Bailey
Active learning to rank
Vladimir Gulin
Day 2 (September 28)
Main program
Counterfactual Reasoning and Computational Advertisement
Leon Bottou
USA
Inference of Causal Direction and Its Application to Machine Learning
Bernhard Schölkopf
Germany
YT – Distributed Data Processing Revisited
Maxim Babenko
Russia
Mergeable Summaries
tutorial, Graham Cormode
USA
Sketch Data Structures and Concentration Bounds
Graham Cormode
USA
Short talks
Inverted multiindex for large-scale nearest neighbor search
Artem Babenko
YT infrastructure application for oil and gas exploration industry
Maxim Ryabinsky
Yandex Data Factory: friendly machine learning pipeline
Oleg Yuhno
Day 4 (September 30)
Main program
Explaining AdaBoost
Robert Schapire
USA
MatrixNet is Yandex's implementation of Gradient Boosted Decision Tree algorithm (GBRT)
Andrey Gulin
Russia
Optimal Stopping Rules and Their Applications
Albert Shiryaev
Russia
Sequential Dedetection/Isolation of Abrupt Changes with Some Applications
Igor Nikiforov
France
Filling Algorithm for Weighted Graph
Evgeny Shchepin
Russia
Short talks
Review of trend models in time-series analysis
Evgeny Burnaev
Detection of change-points in polynomial trend of time-series with seasonal components
Evgeny Burnaev
Andrey Lokot
Quality-biased Ranking for Queries with Commercial Intent
Alexander Shishkin
Polina Zhinalieva
Kirill Nikolaev
Graph-based Malware Distributors Detection
Andrei Venzhega
Ad relevance prediction
Sergey Kacher
Day 5 (October 1)
Main program
Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications
Mikhail Roytberg
Russia
Skeptical, Scalable Topic Models: Models for Interactive Exploration of Diverse Data
Jordan Boyd-Graber
USA
Combining Machine Learning with Formal Linguistic Semantics
Kathleen Dahlgren
USA
Following the Thread of Dialogue with Linguistic Semantics
Kathleen Dahlgren
USA
Using Hierarchical Ontologies for Interpretation of Activities and Texts
Boris Mirkin
Russia, UK
Short talks
Studying structure and evolution of search engine query flow
Alexander Kukushkin
Alexey Tikhonov
Yandex computer vision technologies
Anton Slesarev
Developing user-behavior based algorithm for detecting adult videos
Boris Okun
Short text classification
Alexei Dral
Emeli Mbaykodzhi
Three-dimensional reconstruction using cluster analysis
Sergey Arkhangelsky
Day 6 (October 2)
Main program
Web-Graph Models and Applications
tutorial, Andrey Raigorodsky
Russia
Critical Sample Size for Bayesian Inference in Gaussian Process Regression
Evgeny Burnaev
Russia
Alexey Zaytsev
Russia
Vladimir Spokoiny
Russia
Regularization, Sparsity, and Logistic Regression
Alexander Genkin
USA
Convex Optimization and Structural Clustering
Evgeny Bauman
USA
Short talks
Text-based Online Advertising: L1 Regularization for Feature Selection
Ilya Trofimov
Use of browser toolbar logs in web search
Gleb Gusev
Nested processes on a hyper-link graph
Maxim Zhukovski
Timely crawling of high-quality ephemeral new content
Ludmila Ostroumova

Speakers

Maxim Babenko (Russia)
Evgeny Bauman (USA)
Leon Bottou (USA)
Jordan Boyd-Graber (USA)
Alexey Chervonenkis (Russia, UK)
Graham Cormode (USA)
Kathleen Dahlgren (USA)
Alexander Gammerman (UK)
Alexander Genkin (USA)
Andrey Gulin (Russia)
Yann LeCun (USA)
Boris Mirkin (Russia, UK)
Igor Nikiforov (France)
Andrey Raygorodsky (Russia)
Mikhail Roytberg (Russia)
Robert Schapire (USA)
Bernhard Schölkopf (Germany)
Evgeny Shchepin (Russia)
Albert Shiryaev (Russia)
Sargur Srihari (USA)
Konstantin Vorontsov (Russia)
Vladimir Vovk (UK)

Program committee

Maxim Babenko
Victor Buchshtaber
Alexey Chervonenkis
Andrey Raigorodsky
Ilya Segalovich
Albert Shiryaev
Konstantin Vorontsov

Organising committee

Artem Babenko
Elena Bunina
Ilya Muchnik
Elena Shiryaeva
Arkady Volozh