Wednesday, October 28

13:00 GMT

Hands-On Real Time Stream Processing for Machine Learning - Alejandro Saucedo, The Institute for Ethical AI & Machine Learning
This talk will provide a practical insight on how to build scalable data streaming machine learning pipelines to process large datasets in real time using Python Asyncio, Kafka, Faust, SpaCy and Seldon. We will be covering a case study performing automated content moderation on Reddit comments in real time. Our dataset will consist of 200k reddit comments from /r/science, 50,000 of which have been removed by moderators. We will be handling the stream data in a Kafka cluster, and the stream processing will be handled using the stream processing library Faust. We will be running the end-to-end pipeline in Kubernetes with various components legeraging SKLearn, SpaCy and Seldon. We will then dive into fundamental concepts on stream processing such as windows, watermarking and checkponting, and we will show how to use each of these frameworks to build complex data streaming pipelines that can perform real time processing at scale. Finally we will show best practices when using these frameworks, as well as a high level overview of tools that can be used for monitoring, including Grafana and Kafka Manager.

Alejandro Saucedo

Chief Scientist, The Institute for Ethical AI & Machine Learning
Alejandro is the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he leads the development of industry standards on machine learning bias, adversarial attacks and differential privacy. Alejandro is also the Director of Machine Learning Engineering at Seldon... Read More →

Wednesday October 28, 2020 13:00 - 13:50 GMT
AI/ML/DL Theater
16:15 GMT

Milvus, How to Accelerate Approximate Nearest Neighbor Search (ANNS) for Large Scale Dataset - Jun Gu, Zilliz
Deep learning models has been proven to be an effective method to extract content from unstructured data like image, video, sound and text. When using pre-trained DL models in production, people will need to handle huge amount of feature vectors. Milvus is an open source vector similarity search engine, which could help users to perform efficient similarity search over billions of vectors. Jun has already introduced the big picture of Milvus project in previous OSS North America event. This time Jun will introduce the technology used in Milvus project, and how Milvus would accelerate ANNS for large scale dataset. Milvus is an incubation project in LF AI foundation.


Jun Gu

Technology evangelist, Zilliz
Jun Gu is the partner of Zilliz, performing the Senior Architect role. Before joined Zilliz, Jun received his under graduate degree of Computer Science from Peking University and worked as database technician for 14 years in companies like ICBC, IBM, Morgan Stanley and Huawei. Jun... Read More →

Wednesday October 28, 2020 16:15 - 17:05 GMT
AI/ML/DL Theater
17:15 GMT

Become a Data Driven Organization through Unified Metadata Using ODPi Egeria - Mandy Chessell, IBM
Become a data-driven organization through exploration of the latest developments and trends in managing compliance, GDPR, data catalogs and governance. The ODPi Egeria project at the Linux Foundation will share how IBM, ING and others are collaborating to build an open ecosystem (interfaces, repositories, tools and experts to collaborate and exchange content) while adhering to governance guidelines and imperatives. Join this session to learn how an open metadata and governance and how you can benefit from it.

Mandy Chessell

ODPi TSC Chairperson and ODPi Egeria project chairperson. IBM Distinguished Engineer, IBM
Mandy Chessell CBE FREng CEng FBCS is an IBM Distinguished Engineer, Master Inventor and Fellow of the Royal Academy of Engineering. Mandy is a trusted advisor to executives from large organisations, working with them to develop their strategy and architecture relating to the governance... Read More →

Wednesday October 28, 2020 17:15 - 18:05 GMT
AI/ML/DL Theater
