Conference
The international scientific conference Machine Learning and Very Large Data Sets took place at the Yandex School of Data Analysis from September 29 through October 2. Participants discussed current issues concerning machine learning and algorithms for processing large volumes of data.
Leading specialists from Britain, Germany, Russia, the United States and France took part in the conference, delivering more than 20 speeches and lectures and participating in five discussion sessions. Speakers included internationally renowned researchers, Yandex staff, students and graduates of the School of Data Analysis. Many issues raised by the speakers aroused great interest among the audience, evidenced in the discussions that continued past their scheduled finishing time. Topics included not only methods of machine learning but also fundamental problems in the theory of knowledge, which was previously confined to the discipline of philosophy. As conference organisers, we pursued several goals:
- to focus the attention of the scientific community on the deep interrelationship between three areas in high technology: machine learning, the analysis of large volumes of information, and the development of an intellectual internet;
- to discuss key ideas of these three areas and the questions that arise in their development: the fundamental capabilities of machines, intellectual collaboration between humans and machines, and the division of tasks between them;
- to think together about whether it is possible to create a machine that can communicate with us in a human language, and highlight the components of this problem that must and can be solved quickly;
- to find the answer to the question why, for the analysis of large volumes of data, it is necessary to be able to detect hidden regularities, having a very small number of manifestations.
- Conference participants expressed their thoughts and ideas on which direction to move forward, what to look for and develop. We were able to attract world-class specialists and acquainted them more closely with what Yandex is doing. This gives impetus to the development of educational programs for the school, and widens our developers’ collaboration with their foreign counterparts.
Conference participants were also taken on a fascinating excursion to the dacha-museum of the great mathematician Andrey Nikolaevich Kolmogorov, organised by one of his former students, Albert Nikolaevich Shiryaev.
Conference program
Conference opening ceremony Arkady Volozh Yandex CEO, Russia | |
Measures of Complexity in the Theory of Machine Learning Alexey Chervonenkis Russia, UK | |
Combinatorial Theory of Overfitting Konstantin Vorontsov Russia | |
Validity and Efficiency of Set Predictors Vladimir Vovk UK | |
Conformal Prediction and Its Applications Alexander Gammerman UK | |
Learning Hierarchies of Invariant Features Yann LeCun USA |
Algorithmic sufficient statistics Nikolay Vereshchagin | |
Model assessment and selection for human activity recognition Mikhail Pikhletsky Ilia Safonov Oleg Tishutin Marc James Ashton Bailey | |
Active learning to rank Vladimir Gulin |
Counterfactual Reasoning and Computational Advertisement Leon Bottou USA | |
Inference of Causal Direction and Its Application to Machine Learning Bernhard Schölkopf Germany | |
YT – Distributed Data Processing Revisited Maxim Babenko Russia | |
Mergeable Summaries tutorial, Graham Cormode USA | |
Sketch Data Structures and Concentration Bounds Graham Cormode USA |
Inverted multiindex for large-scale nearest neighbor search Artem Babenko | |
YT infrastructure application for oil and gas exploration industry Maxim Ryabinsky | |
Yandex Data Factory: friendly machine learning pipeline Oleg Yuhno |
Explaining AdaBoost Robert Schapire USA | |
MatrixNet is Yandex's implementation of Gradient Boosted Decision Tree algorithm (GBRT) Andrey Gulin Russia | |
Optimal Stopping Rules and Their Applications Albert Shiryaev Russia | |
Sequential Dedetection/Isolation of Abrupt Changes with Some Applications Igor Nikiforov France | |
Filling Algorithm for Weighted Graph Evgeny Shchepin Russia |
Review of trend models in time-series analysis Evgeny Burnaev | |
Detection of change-points in polynomial trend of time-series with seasonal components Evgeny Burnaev Andrey Lokot | |
Quality-biased Ranking for Queries with Commercial Intent Alexander Shishkin Polina Zhinalieva Kirill Nikolaev | |
Graph-based Malware Distributors Detection Andrei Venzhega | |
Ad relevance prediction Sergey Kacher |
Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications Mikhail Roytberg Russia | |
Skeptical, Scalable Topic Models: Models for Interactive Exploration of Diverse Data Jordan Boyd-Graber USA | |
Combining Machine Learning with Formal Linguistic Semantics Kathleen Dahlgren USA | |
Following the Thread of Dialogue with Linguistic Semantics Kathleen Dahlgren USA | |
Using Hierarchical Ontologies for Interpretation of Activities and Texts Boris Mirkin Russia, UK |
Studying structure and evolution of search engine query flow Alexander Kukushkin Alexey Tikhonov | |
Yandex computer vision technologies Anton Slesarev | |
Developing user-behavior based algorithm for detecting adult videos Boris Okun | |
Short text classification Alexei Dral Emeli Mbaykodzhi | |
Three-dimensional reconstruction using cluster analysis Sergey Arkhangelsky |
Web-Graph Models and Applications tutorial, Andrey Raigorodsky Russia | |
Critical Sample Size for Bayesian Inference in Gaussian Process Regression Evgeny Burnaev Russia Alexey Zaytsev Russia Vladimir Spokoiny Russia | |
Regularization, Sparsity, and Logistic Regression Alexander Genkin USA | |
Convex Optimization and Structural Clustering Evgeny Bauman USA |
Text-based Online Advertising: L1 Regularization for Feature Selection Ilya Trofimov | |
Use of browser toolbar logs in web search Gleb Gusev | |
Nested processes on a hyper-link graph Maxim Zhukovski | |
Timely crawling of high-quality ephemeral new content Ludmila Ostroumova |