It's UWAweek 47
|
unitinfo
This page provides helpful information about many coursework units offered by
Computer Science and Software Engineering
in 2023.
The information here is not official -
for official information please see the
current UWA Handbook.
Instead, it will help students to prepare for their future units,
before the beginning of each semester,
and before they have access to
UWA's
Learning Management System (LMS).
|
About the unit CITS2402 Introduction to Data Science (2nd semester 2023)
Unit description:
Data is ubiquitous in modern society. It is used to monitor the economy, inform business decisions, understand how the environment is changing, and communicate public health messages. Data science is a booming field that harnesses raw data and turns it into actionable knowledge. Data Scientists develop and employ tools to collect, understand and communicate data and its meaning. They are able to identify trends, understand demographics and inform interventions. They are able to work across disciplines, from science to business, health, media and politics. But data can also be misused, and a professional Data Scientist will understand the ethical demands of responsible use of data. This hands-on unit provides practical experience, using the programming language Python, for solving real-world data science problems, from acquiring data from public sources, to understanding the data through analysis and modelling, to visualising and presenting the results.
Unit outcomes:
Students are able to (1) understand and implement the stages in the data science lifecycle, from data acquisition and cleaning through to analysis, modelling and visualisation; (2) independently research, solve and communicate results for real-world data science problems from across a range of disciplines; (3) demonstrate a command of computational structures and operations, and discuss the relevant efficiency and storage implications of alternative solutions; (4) utilise appropriate encoding and visualisation methods for different types of data, including categorical, numerical and time-series data; (5) understand the power of data used well or used poorly, and critically assess the way data is used and presented in business, science, the media and the wider community; and (6) recognise and discuss the ethical responsibilities of a data scientist.
Unit coordinator:
Unit homepage:
|
|
Unit is offered in these majors and courses:
Indicative weekly topics:
week 1 |
Introduction. Importance and value of data science. What data scientists do. Data science jobs. Traits of data scientists. Theory vs practice. The python ecosystem. Prerequisite knowledge and skills. |
week 2 |
Case study domain: Data in the news, politics and public policy. Key concepts: Trust, ethics and the use of data. Practical focus: Python consolidation, esp. strings, lists, dictionaries, file I/O, structured data. Data cleaning and sorting. |
week 3 |
|
week 4 |
Case study domain: Demographics. Key concepts: Trust, ethics, data sources and data use. Characterising populations vs individuals. Practical focus: Data acquisition from official sources. Data visualisation and plotting - matplotlib. |
week 5 |
|
week 6 |
Case study domain: Environment and climate. Key concepts: Trends in data. Time-series data. Basic time and space implications of data type and data structure choice. Practical focus: Numerical data types, numerical programming and arrays - numpy. Indexing, selection and masking. |
week 7 |
|
week 8 |
Case study domain: Economics and finance. Key concepts: Models, error metrics, computational model fitting. Practical focus: Numerical programming continued, 2-D arrays, numpy functions, data analysis, trend identification, model fitting. |
week 9 |
|
week 10 |
Case study domain: Health and epidemiology. Key concepts: Tabular data, joining tables (cf databases), correlation. Practical focus: Working with tabular data - pandas. Operations on tables. Regular expressions. Examining correlations. |
week 11 |
|
week 12 |
Health and epidemiology continued (depending on time). Key concepts: Emerging data and trends, analysis in real time, polynomial vs exponential trends, explanation vs prediction. |
Indicative assessment:
TBC
Useful prior experience and background knowledge:
Requires Python programming fundamentals.
Useful prior programming and software experience:
Hardware required for this unit:
Students are able to undertake their laboratory exercises and projects in laboratories in the CSSE building, but most students also complete work on their own laptops.
The following hardware is required to successfully complete this unit:
Robust internet connection
Software required for this unit:
Students are able to undertake their laboratory exercises and projects in laboratories in the CSSE building, but most students also complete work on their own laptops.
The following software is required to successfully complete this unit:
Unit uses University-licensed cloud software (CoCalc). Require at least one free mainstream browser (other than MS Edge), eg Chrome.
Operating system(s) used in this unit:
Different units will use different operating systems for their teaching - for in-class examples, laboratory exercises, and programming projects.
If an operating system is REQUIRED, it will be used when marking assessments.
ANY reasonable platform for web access; linux used in cloud setting
This information last updated 4:52pm Wed 10th May 2023