Undergraduate Course Offerings
21:198:105 Everyday Data (3)
Every day we share data about ourselves online using social media platforms and in real life through shopping interactions and surveys. The data that we share creates a narrative of our past, present, and future. Through this course, students understand how our data is being collected, analyzed, and visualized. Students learn the basic principles of data visualization in Python and will be immersed in standard data science practices to learn exploratory data analysis and to effectively communicate findings and solutions. Students will demonstrate their computing and analytical abilities in the following ways:
- Python fundamentals for Data Visualization: Libraries, data types, functions, and plots;
- Descriptive and inferential statistics;
- Data Science tools and data analysis;
- Communication and presentation skills with a focus on compelling data narratives and visualizations.
Co-requisite: students enrolled in this course must take Introduction to Data Science Professions (21:090:170) concurrently.
21:090:170 Introduction to Data Science Professions (1)
Introduction to Data Science Professions provides students an opportunity to gain deeper insights into the interdisciplinary field of data science. Careers in data science offer highly competitive salaries with growth opportunities and span the public and private sectors. With such opportunities come increasing demands for well-prepared and well-trained students who are equipped for industry and research careers upon completing their undergraduate degrees. This course introduces students to the career paths that benefit from an education in the collection, analyzation, and summarization of data and provides them with the information, resources, and professional networks to assist with planning and preparing for careers in data science. Students will demonstrate their skills in the following ways:
- Understand and describe career pathways in data science;
- Establish a network of data science professionals and apply to internships and data science research opportunities;
- Receive coaching and mentoring from data science professionals;
- Develop professional pitches, resumes, and professional development plans.
Co-requisite: students enrolled in this course must take Everyday Data (21:198:105) concurrently.
21:198:220 Fundamentals of Data Visualization (3)
A picture is worth a thousand words. This course introduces undergraduate students to Data Visualization. This course is intended to teach students how to create meaningful charts and figures that can simultaneously convey useful information and be pleasing to the eye. Students will learn to use the programming language R to develop graphics. The course is divided into three general themes: (1) Research Methods and Statistics; (2) Programming in R; and (3) Generating Meaningful and Insightful Graphics. The course aims to offer an interactive environment where students feel comfortable to generate and share ideas. Students will be motivated to discuss topics reviewed in class and to critically assess how others have used data visualization to convey the results of their analyses.
This course approaches data visualization through theory and practical programming approaches. A picture may be worth a 1000 words, but these words may mean little without proper thought, effort and clear set goals. This course follows an analytical approach to the creation of meaningful graphics, which aims to motivate students to think critically of the figures, graphics and charts they encounter in daily life. By the end of this course, students should be able to:
- Know how to write code in R
- Managing the input and output of data
- Control structures and procedural programming
- Understanding data structures
- Know the mathematical foundations of statistics and implement them accurately in R
- Be able to develop code that can transform data sets into meaningful figures and charts
As a whole, this course aims to offer a well-rounded-view of data visualization and to motivate students to generate critical ideas on the meaningful representation of data in every day scenarios.
21:198:329 Statistics and Machine Learning (3)
Machine Learning is the new technology which transforms the way we solve problems in science, engineering, and business. The goal of this course is to introduce students the basic concepts in statistical learning that can be used to design models and automate data analysis. Programming in Python and R will be introduced. Students enrolling in this course are assumed to be familiar with one variable calculus. Prior knowledge of probability and statistics would be helpful, but not necessary. This course will cover the following key concepts in statistical learning: Supervised learning, bias-variance tradeoff, regression, classification, validation, regularization, ensemble method, support vector machine and kernel methods, and unsupervised learning.
21:198:330 Ethical Issues in Data Science (3)
In performing their roles in the workplace, practicing computer or data scientists confront a number of important ethical questions: questions about the possibility of bias in automated decision-making systems, about what constitutes appropriate collection, aggregation, and use of personal information about the users of technology services, and about collective and individual responsibility for the social impacts of newly developed technologies. The purpose of this course is to consider these questions directly in order to prepare students to think critically about the ethical questions which will arise in connection with their work later in their lives.
In our first unit, we will discuss the problem of bias in machine learning, focusing especially on the controversy surrounding whether COMPAS, an algorithm used in the US judicial system to make bail and sentencing decisions, is biased against Black defendants. In our second unit, we will discuss the conflict between consumers who wish to make use of technological services while maintaining their privacy and corporations who wish to obtain as much information as possible in exchange for providing their services. In particular, we will consider a number of theories of what privacy is and why it is valuable, survey some of the ways privacy can be compromised by new technologies, and then engage with some suggestions for how to manage privacy concerns in light of these technologies. In our third unit, we will consider the question of corporate and individual responsibility for the harms caused by new technologies. We will focus in this unit on the case study of China’s growing surveillance infrastructure, asking to what extent the corporations and individual engineers working on the technologies, which make surveillance possible, are collectively or individually responsible for the harms they cause. Students will demonstrate their understanding in the following ways:
- By critically engaging with the course material, students will gain a detailed understanding of some of the most important ethical issues relevant to the field of data science.
- The topics covered in the course will also serve as convenient introductions to some major concepts in value theory such as: justice, fairness, bias, privacy, consent, moral responsibility, collective agency, and complicity.
- Through class discussion and structured writing exercises, students will develop crucial philosophical abilities like reconstructing and evaluating arguments, articulating ideas in conversation, and writing clearly and cogently.