- Course title: SDS 192 Introduction to Data Science
- Instructor: Albert Y. Kim - Assistant Professor of Statistical & Data Sciences.
- Office location: McConnell Hall 215 (accessible from stairwell closest to Bass Hall)
Email: Slack team: Click hashtag icon in navbar for the browser interface or use the desktop/mobile app.
- Meeting locations/times:
- Section 01: MWF 10:50 AM-12:05 PM / Sabin-Reed 220
- Section 02: MWF 1:20 PM-2:35 PM / Sabin-Reed 220
- Outside help:
Instructor work-life balance
- I will respond to Slack messages sent during the week within 24h. I will respond to Slack messages sent during the weekend at my own discretion.
- If possible, please only Slack me with briefer and administrative questions; I prefer having more substantive conversations in person as it takes me less energy to understand where you are at.
- I will do my best to return all grading as promptly as possible.
- I will rarely be on campus on Thursdays as this is my self-care day.
How can I succeed in this class?
- When I have questions or don’t understand something:
- “Am I asking questions in class?”
- “Am I asking questions on Slack in the
#questions channel?” Even better: “Am I answering my peers’ questions on Slack?”
- “Having I been going to the Spinelli tutoring center for help on R and the tidyverse?”
- “Have I been coming to office hours?”
- Lectures, labs, and readings:
- “Am I staying on top Slack notifications sent between lectures?” If you need help developing a notification strategy that best suits your lifestyle, please speak to me.
- “Am I attending lectures consistently?”
- “During in-class activities, am I actually running code line-by-line and studying the outputs, or am I just going through the motions?”
- “During in-class exercises, am I taking full advantage that I’m in the same place at the same time with the instructor, the lab assistants, and most importantly your peers, or am I browsing the web/texting the whole time?”
- “Have I been doing the associated readings for each lecture?”
Course Description & Objectives
From Smith College Course Search: An introduction to data science using Python, R and SQL. Students learn how to scrape, process and clean data from the web; manipulate data in a variety of formats; contextualize variation in data; construct point and interval estimates using resampling techniques; visualize multidimensional data; design accurate, clear and appropriate data graphics; create data maps and perform basic spatial analysis; and query large relational databases. No prerequisites, but a willingness to write code is necessary.
On top of the goals of the above description, this semester you will:
- Equip yourselves with the data science tools necessarily to perform effective exploratory data analysis.
- Learn about project workflows such as
- Learn how to use web search engines like Google effectively in a data science setting.
The lecture schedule and associated readings can be found on the main page of this course webpage.
- Bring your laptop, a set of headphones, colored pens/pencils, and your paper notebook to every lecture.
- You are expected to stay until the end of lecture. If you need to leave early, please confirm with me at the beginning of lecture and sit somewhere where your departure will be minimally disruptive.
- Attendance will not be explicitly taken and occasional absenses are excused. However, extended absenses should be mentioned to me.
- However, you are responsible for asking your peers for what you missed. For example, makeup lectures will not be held during office hours.
- Lecture will be held as usual on Monday 11/25 (before Thanksgiving).
All due dates can be found on the main page of this course webpage.
There will be two midterms during the semester. You’ll take these midterms in Seelye Self-Scheduled Exam center between Friday at 5pm and Sunday at 11:55pm. Instructions on taking exams for Smithies and Five College Students.
- Midterm I: Data visualization
- Midterm II: Data wrangling
The higher score of the two midterm scores will be weighted 20% and the lower will be weighted 15%.
Three mini-projects 30%
There will be three mini-projects with particular themes. You’ll do these mini-projects in groups of 2 assigned by me.
- Mini-project 1 5%: Data visualization
- Mini-project 2 10%: Data wrangling
- Mini-project 3 15%: Maps
Final project 20%
There will be a final project due the last day of exams. You’ll do this final project in groups of 2-3 where you can choose your groupmates.
As for whether you should be on Smith campus during exam week, this is for you and your groupmates to decide. You do not need to consult the instructor.
Problem sets 10%
There will be 7 problem sets, with only the top 5 scores going towards your grade. Since the lowest two grades will be dropped, no extensions to problem sets will be given.
It is difficult to explicit codify what constitutes “an engaged student,” so instead I present the following rough principle I will follow: you’ll only get out of this class as much as you put in. That being said, here are multiple pathways for you to stay engaged in this class:
- In particular: Peer evaluations for all projects.
- Asking and answering questions in class.
- Coming to office hours.
- Posting questions on Slack.
- Even better: Responding to questions on Slack.
- Collaboration: While I encourage you to work with your peers for problem sets and labs, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged in your submissions.
- Honor Code: All your work must follow the Smith College Academic Honor Code Statement; in particular all external sources must be cited in your submissions.
- Grading: I reserve the right to not discuss any grading issues in class and instead direct you to office hours.
Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Disability Services Office at the beginning of the semester. To do so, call 413.585.2071 to arrange an appointment with Laura Rauscher, Director of Disability Services.
Code of Conduct
As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.
All students, the instructor, the lab instructor, and all data assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, labs, office hours, tutoring hours, and over Slack.
This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.