Skip to main content

CZ4125 Developing Data Products

Course Summary

The course looks into the different components of developing an end-to-end data product, ranging from how data is managed to how the data product is deployed. If Intro to Data Science is a course designed to introduce data science to DSAI freshmen, this course SC4025 is designed to consolidate whatever that has been learnt over the 4 years of the DSAI curriculum.

The emphasis of the course is not on technical depth, but rather more on exposing students to how different topics that has been touched in the DSAI curriculum and different state-of-the-art tools can be brought together to build a useful data product. The topics covered in this course are:

  1. What is a data product?
  2. Data manipulation
  3. Data visualisation
  4. SQL and NoSQL databases
  5. Product design, validation and statistical testing
  6. Machine learning
  7. Natural language processing and topic modelling
  8. Graph and network analytics
  9. Front-ends, dashboards and web-based applications
  10. Big data infrastructure
  11. Data privacy and security

Workload and assessment

The workload is one of the heaviest among all other SCSE courses. There is a wide range of topics and due to the speed of lectures, it may be difficult to follow if you have not taken the relevant course (e.g. have not taken NLP for topic 7).

Furthermore, the coursework is also quite intense. There are 2 individual assignments and 1 group project.

  • Assignment 1 is to collect information of SCSE faculty members through web scraping of DR-NTU, Google Scholar and DBLP.
  • Assignment 2 is to create a dashboard to display the individual profiles of SCSE faculty members, performing analytics to understand each of the members' research interests. You are also required to create a faculty profile of SCSE to display the research trends within SCSE and how the faculties work together with each other.
  • Group project is to create a data product of your own choice.

The coursework is heavy especially for assignment 2 and group project because these are open-ended and you do not have to restrict yourself only to whatever that was taught in the course. As such, there may be some self-learning involved.

Things to take note of

There are students who like this course a lot, but there are also students who dislike the course. The main reason for the dislike is due to the pace, heavy workload and the amount of self-learning involved. However, you may also like to think of this course as an introduction to what the work of a data scientist entails. In this course, not only will you learn about the course content, but you will also learn about how to dissect task requirements, how to self-learn effectively, how to manage time, how to work with people, how to go beyond what was taught, how to deal with ambiguity, how to manage projects, etc.

The instructor also encourages a lot of critical thinking with the ungraded assignments, and do attempt these as you will learn a lot from the experience.

Conclusion

This course is recommended for students who wish to develop their skills on how to be a better data scientist (not just the hard skills, but also the soft skills). The course workload is heavy so do expect to spend some time on this course.