Scope and Topics

Machine Learning and Data Science have revolutionized and entered multiple aspects of our everyday lives, yet there is a digital divide that is broadening every single day. The BigData revolution has hit the Western world (i.e., North America and Europe) much more significantly, as compared to developing countries in Africa, Asia and South America. As a result, most of the technologies that have been developed using ML and data science solve first-world problems faced by common people in the Western world. While products like Siri and Alexa bring a lot of value to people in the Western world, they bring little value to people in Sub-Saharan Africa, who struggle on a daily basis with much graver challenges, e.g., poor sanitation, poverty, hunger, infectious diseases, etc. As a result, it is urgent to refocus the attention of the SIGKDD community towards problems faced by these underserved populations in developing countries. Accordingly, there is a growing interest to ensure that current and future data science research is used in a responsible manner for the benefit of humanity in the developing world and among marginalized communities (i.e., for social good). To achieve this goal, a wide range of perspectives and contributions are needed, spanning the full spectrum from fundamental research to sustained deployments in the real-world. Note that problems in these domains are characterized by small data, uncertainties, etc., hence new fundamental research needs to be conducted by researchers in the SIGKDD community to solve these problems.

To that end, this workshop will explore how data science research can contribute to solving challenging problems faced by current-day marginalized communities around the world, especially among developing countries. For example, what role can data science research play in promoting health, sustainable development and infrastructure security? How can data science initiatives be used to achieve consensus among a set of negotiating self-interested entities (e.g., finding resolutions to trade talks between countries)? To address such questions, this workshop will bring together researchers and practitioners across different strands of data science research and a wide range of important real- world application domains. The objective is to share the current state of research and practice, explore directions for future work, and create opportunities for collaboration. In addition, the workshop will place a special emphasis on highlighting data science approaches for tackling the COVID-19 pandemic (see preliminary agenda below). The organizers believe that data science research has an important role to play in providing unique insights about the pandemic and developing targeted responses; we encourage submissions from both data science researchers as well as epidemiologists, health policy researchers, and other domain experts who are interested in engaging with the SIGKDD community.

A unique feature of our workshop is that we aim to engage and invite non-profit organizations which already do significant work on the upliftment of marginalized communities such as homeless youth in North America, poor smallholder farmers in Sub-Saharan Africa, etc. We aim to create a dialogue between data science researchers (who possess the tools required to develop data-driven solutions which can benefit marginalized communities) & non-profit organizations which can inform researchers about what are the real problems that need urgent attention, and what real-world constraints do data-driven solutions need to respect in order to have real impact on the ground.

Our workshop’s target audience consists of: (i) data science and machine learning researchers who have used (or are currently using) their ML research to solve important real- world problems for society’s benefit in a measurable manner; (ii) non-profit organizations who wish to explore how data-driven solutions could help them improve their day-to-day operations which enables them to amplify their real-world impact; (iii) interdisciplinary researchers combining data science research with various disciplines (e.g., social science, psychology and criminology); and (iv) engineers and scientists from organizations who aim for social good, and look to build real world systems using data science techniques.

Topics

The workshop organizers invite paper submissions on the following (and related) topics:
  • Applications of Learning and Optimization in Societally Beneficial Domains
  • ML Approaches for COVID-19 and Epidemics
  • Real-world applications of game theory for security
  • Data Science for environmental crime
  • Data Science for Environmental Sustainability
  • Data Science for Urban Planning
  • Computational Sustainability
  • Data Science for Education
  • Data Science for Public Health
  • Data Science for International Relations
  • Data Science for Democracy in the Developing World
  • Explainable Artificial Intelligence and Machine Learning for Social Good

Format

The workshop will be a one-day meeting. It will include a number of (possibly parallel) technical sessions, a virtual poster session where presenters can discuss their work, with the aim of further fostering collaborations, multiple invited speakers covering crucial challenges for the field of Data Science for Social Good and learning and will conclude with a panel discussion.

Attendance

Attendance is open to all. At least one author of each accepted submission must be present at the workshop.

Important Dates

  • May 20, 2021 – Submission Deadline
  • June 10, 2021 – Acceptance notification
  • July 2, 2021 – Final Workshop Program Finalized on Website
  • August 15, 2021, 8 AM SGT - 12:30 PM SGT – Workshop Date

Submission Information

Submission URL: Link

Submission Types

  • Technical Papers: Full-length research papers of up to 8 pages (excluding references and appendices) detailing high quality work in progress or work that could potentially be published at a major conference in KDD format.
  • Short Papers: Position or short papers of up to 4 pages (excluding references and appendices) in KDD format that describe initial work or the release of privacy-preserving benchmarks and datasets on the topics of interest.

All papers must be submitted in PDF format, using the KDD-21 author kit. Submissions should include the name(s), affiliations, and email addresses of all authors, i.e., submissions are not double-blind.
Submissions will be refereed on the basis of technical quality, novelty, significance, and clarity. Each submission will be thoroughly reviewed by at least two program committee members.
Submissions of papers rejected from KDD 2021 technical program are welcomed.

For questions about the submission process, contact the workshop chairs.

Program

  1. 8 AM - 8:45 AM SGT: Invited Talk by Prof. Arunesh Sinha (Singapore Management University)
  2. 9 AM - 9:20 AM SGT: Anomaly Detection and Automated Labeling for Voter Registration File Changes. Sam Royston, Ben Greenberg, Omeed Tavasoli and Courtenay Cotton.
  3. 9:20 - 9:40 AM SGT: Data-Driven Optimization for Police Districting in South Fulton, Georgia. Shixiang Zhu, Alexander Bukharin, Le Lu, He Wang and Yao Xie.
  4. 9:40 - 10:00 AM SGT: Attention-augmented Spatio-Temporal Segmentation for Land Cover Mapping. Rahul Ghosh, Praveen Ravirathinam, Xiaowei Jia, Chenxi Lin, Zhenong Lin and Vipin Kumar.
  5. 10:00 AM - 10:20 AM SGT: Effects of personality traits in predicting grade retention in Brazilian students. Carmen Toledo, Guilherme Bassedon, Jonathan Batista, Lucka Gianvechio, Felipe Polo, Carlos Guatimosim and Renato Vicente.
  6. 10:20 AM - 10:40 AM SGT: Predicting Locust Movement using Spatiotemporal Deep Models. Maryam Tabar, Jared Gluck, Anchit Goyal, Fei Jiang, Derek Morr, Annalyse Kehs, Dongwon Lee, David Hughes and Amulya Yadav.
  7. 10:40 - 11:00 AM: Building Knowledge Base for the Domain of Economic Mobility of Older Workers. Ying Li, Vitalii Zakhozhyi, Yu Fu, Joy He-Yueya, Vishwa Pardeshi and Luis Salazar.
  8. 11:00 AM - 11:20 AM: Exploring the Scope of Using News Articles to Understand Development Patterns of Districts in India. Mehak Gupta, Shayan Aslam Saifi, Konark Verma, Kumari Rekha and Aaditeshwar Seth.
  9. 11:20 AM - 11:40 AM: Experiences with the Introduction of AI-based Tools for Moderation Automation of Voice-based Participatory Media Forums. Aman Khullar, Paramita Panjal, Rachit Pandey, Abhishek Burnwal, Prashit Raj, Ankit Akash Jha, Himanshu Himanshu, Priyadarshi Hitesh, R Jayanth Reddy and Aaditeshwar Seth.
  10. 11:40 AM - 12:00 PM: Have you tried Neural Topic Models? Comparative Analysis of Neural and Non-Neural Topic Models with Application to COVID-19 Twitter Data. Andrew Bennett, Dipendra Misra and Nga Than.
  11. 12:00 PM - 12:20 PM: Using Algorithms in Resource-Constrained Public Sector: Notes from Rural Road Planning in India. Harsh Nisar, Deepak Gupta, Pankaj Kumar, M. Srinivasa Rao, A. V. Rajesh and Alka Upadh.
  12. 12:20 PM - 12:30 PM: Closing Remarks

Workshop Chairs

Amulya Yadav

Penn State University

amulya@psu.edu

Rayid Ghani

Carnegie Mellon University

rayid@cmu.edu

Vipin Kumar

University of Minnesota

kumar001@umn.edu

Eric Horvitz

Microsoft

horvitz@microsoft.com

Bistra Dilkina

University of Southern California

dilkina@usc.edu

Thanh Nguyen

University of Oregon

thanhhng@cs.uoregon.edu