About the Course

Basic information

  • Instructors:
    • Albert Y. Kim (he/him) - Assistant Professor of Statistical & Data Sciences. Please call me “Professor Kim”
  • Office locations:
    • Professor Kim: McConnell Hall 215 (accessible from stairwell closest to Bass Hall)
  • Email: Slack team:
  • Meeting locations/times: Check course search
  • Office hours:
    • Check calendar for most up-to-date information including location (in-person or Zoom) are indicated in calendar entry
    • Before coming to office hours, please have your question ready on your computer
  • Spinelli Center SDS drop-in tutoring hours. Note: AJ and Swaha have taken this course before
  • Personal or private discussions:
    • For quick discussions: Slack DM Professor Kim
    • For longer discussions: Book an appointment at bit.ly/meet_with_albert. Location (in-person or Zoom) is indicated in calendar entry
    • Please do not book an appointment for non-personal or non-private discussions.

Description

Advanced programming techniques for data science using R. This course is not about data analysis—rather, students will learn the R programming language at a deep level. Topics may include data structures, control flow, regular expressions, functions, environments, functional programming, object-oriented programming, debuggging, testing, version control, documentation, literate programming, code review, and package development. The major goal for the course is to contribute to a viable, collaborative, open-source, publishable R package. This course satisfies the programming depth requirement for the SDS major.

Prerequisites

SDS 192 and CSC 111

Learning goals

  • Contribute to an open-source software project using version control
  • Write a robust, encapsulated software package in the R programming language
  • Write and debug sophisticated functions in R

Textbooks

Required

Advanced R, 2nd edition, Hadley Wickham, CRC Press, 2014. Available free online.
R Packages, Hadley Wickham, O’Reilly, 2015. A preview of the 2nd edition is available free online.


Suggested as supplementary references:

Accommodation

Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Disability Services Office at the beginning of the semester. To do so, call (413) 585-2071 to arrange an appointment with the Director of Disability Services.


Policies

Inclusion

I am committed to fostering a classroom environment where all students thrive. I am committed to affirming the identities, realities and voices of all students, especially those from historically marginalized or underrepresented backgrounds. I am dedicated to creating a space where everyone in the class is respected, is free from discrimination based on race, ethnicity, sexual orientation, religion, gender identity, disability status, and other identities, and feel welcome and ready to learn at your highest potential. If you have any concerns or suggestions for how to make this class more inclusive, please reach out to me. I am here to support your learning and growth as data scientists and people!

Attendance

  1. In keeping with Smith’s core identity and mission as an in-person, residential college, SDS affirms College policy that students will attend class in person. SDS courses will not provide options for remote attendance. Students who have been determined to require a remote attendance accommodation by the Office of Disability Services will be the only exceptions to this policy. As with any other kind of ADA accommodations, please notify your instructor during the first week of classes to discuss how we can meet your accommodations.
  2. Attendance:
    • No attendance taken
    • If you choose not to attend, you accept responsibility for any lost educational value.
    • Extended absences should be mentioned to me.
  3. You are expected to stay until the end of lecture. If you need to leave early, please confirm with me at the beginning of lecture and sit somewhere where your departure will be minimally disruptive.
  4. Lecture will be held as usual on Tue 11/22 (before Thanksgiving).

Masking (college-wide)

College policy applies. Following this policy, I will not wear a mask while lecturing. However, when answering questions up close, I will wear a mask.

Collaboration

Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and prepare for exams. However, all work that you submit for credit must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student or from online sources is not acceptable and will receive no credit. No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.

Academic Honor Code Statement

Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations.

Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.

Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.

Code of Conduct

As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, deliberate misgendering or use of “dead” names, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

All students, the instructor, the lab instructor, and all assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, labs, office hours, tutoring hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.


Content

Assignments

This is a 4 credit course, meaning that by federal guidelines, it should consume about 12 hours per week of your time. We meet for 2.5 hours per week. That means you should be spending about 9.5 hours per week, or nearly 2 hours per day, on this course outside of class.

  1. Quizzes [15%]: Weekly reading quizzes will assess your understanding of the material.
  2. Labs [0%]: Daily programming assignments in R with written explanations. These assignments are ungraded, but they are not optional!
  3. Projects [65%]: You will work on three projects over the course of the semester. These projects will be structured, but fairly open-ended to allow you to be creative. Evaluation will emphasize originality and ingenuity in addition to sophistication and complexity. This process will involve code review. I will assess the code for functionality, clarity, robustness, and readability.
    1. Div I Project [15%]
    2. Div II Project [20%]
    3. Div III Project [30%]
  4. Peer evaluations [10%]: There will be at least one mid-semester peer evaluation as well as an end of semester evaluation.
  5. Engagement [10%]: Active participation in class, engagement with group work, activity on GitHub, helpfulness on Slack, and regular attendance will comprise the remainder of your grade.

Extensions

Extensions up to 48 hours will typically be granted when requested at least 48 hours in advance. Longer extensions, or those requested within 48 hours of a deadline will typically not be granted. Please plan accordingly. Please note that because many of the assignments in this class are collaborative, individual extensions for group assignments will be problematic.

Grading

When grading your written work, we are looking for solutions that are technically correct and reasoning that is clearly explained. Numerically correct answers alone are not sufficient. Neatness and organization are valued, with brief, clear answers that explain your thinking. If we cannot read or follow your work, we cannot give you full credit for it.


Resources

Moodle and course website

The course website and Moodle will be updated regularly with lecture handouts, project information, assignments, and other course resources. Homework and grades will be submitted to Moodle. Please check both regularly.

Computing

The use of the R statistical computing environment with the RStudio interface is thoroughly integrated into the course. Both R and RStudio are free and open-source, and are installed on most computer labs on campus. Please see the Resources page for help with R. If you have a Chromebook, you should be able to complete the assignments using the RStudio Server. Please see me if you don’t already have an account.

Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. If you do not have a laptop, there are loaner laptops available – please contact me if you need one.

Communication

  • Slack is the primary forum for course-related discussions of all kinds. Please do not email me with course-related questions! Instead, post those #questions on Slack. If discretion is absolutely necessary, private message me on Slack.
  • GitHub will host all of the code for projects associated with this course. All repositories are private by default.

Writing

Your ability to communicate results—which may be technical in nature—to your audience—which is likely to be non-technical—is critical to your success as a data analyst. The assignments in this class will place an emphasis on the clarity of your writing.

The Spinelli Center

Coming soon.