SYS 2202: Data and Information Engineering

Overview

Spring Semester Undergraduate Course at UVA

This course provides an introduction to a fundamental aspect of data science and engineering - working with data. Learn skills to efficiently and effectively obtain, manipulate, store, and analyze data (i.e., convert data to information) to support decision making and future data modeling (e.g., regression, data mining, machine learning) efforts. Emphasis on obtaining, cleaning, combining, and wrangling the data into a more usable form. Learn how to break up a large data set into manageable pieces and then use a variety of quantitative and visual tools to summarize and extract information from it. The challenges of big data (e.g., size, streaming data, mixed variable types) will be addressed throughout the course. As an introductory course, the focus will be on understanding basic concepts and how to implement them in R, a leading data science language.

Course Outline:

  • Introduction to Data Science and Engineering

  • Data Collection

  • Getting to Know Your Data

    • Data Types

    • Basic Statistical Descriptions of Data

    • Data Visualization

    • Measuring Data Similarity and Dissimilarity

  • Data Preprocessing

    • Data Cleaning

    • Data Integration

    • Data Reduction and Transformation

    • Dimensionality Reduction

Spring 2021 Projects

Vaccination Status

The COVID19 vaccine rollout has been different in every part of the US. This group explored the effects of income per capita, political party, and population on a state's vaccine rollout.

Success Patterns in the NBA

An analysis of the five most important player statistics in NBA basketball and their effect on win/lose outcomes in games.

Substance Abuse

This project analyzed societal factors such as geography, legalization, age, family structure, education, and employment and how they correspond with substance use, abuse, and recovery.

Sports Betting Analysis

With sports betting recently becoming legal in certain states around the country, and still yet to be legal in others, this project explored sports betting over/under and spread, and how the weather and game status affect these metrics.

Socioeconomic Status

This study aims to investigate socioeconomic disparities in Virginia, specifically looking at education attainment, employment, income, poverty, and degree of urbanization.

Social Media Analysis

For influencers, advertisers, and companies who want to effectively use social media platforms to promote their products to consumers and achieve a more effective and impactful social media presence, it is crucial to understand how users interact with them. This project analyzed how a company can optimize popularity and engagement of their posts on Facebook.

Racial and Gender Bias Analysis

This project explores the question of how race impacts different aspects of American society, stemming from systemic racial bias.

Exploring Pandemic Trends

This analysis addresses the various factors that have influenced the spread of COVID-19 across the US since early 2020. While there are numerous underlying factors, the focus of this analysis is on vaccines, mask policies, variants, and change of virus “hotspots” over time, as well as a comparison of the spread of COVID-19 in the USA to other countries and regions of the world.

Money and Sports Performance

Is there an optimal way to distribute money in order to get the most successful athletic team? This group took a closer look at whether spending more money can mean more success for sports teams.

Mental Health Analysis

Mental health is a pressing subject that can deeply affect anyone, anywhere. This project explores location, housing, demographics, COVID-19, and technology industries, observing how they affect mental health.

Cybersecurity Analysis

This group's analysis investigates cybersecurity trends to detect possible vulnerabilities that must be attended to when designing and implementing new cybersecurity measures.

Climate Change Analysis

Rising temperatures have created concerns among the scientific community regarding sea levels and the ways that communities and infrastructure will be affected by rising sea levels.

This project performed an analysis of ocean level rise, gross domestic profit, disastrous weather, Arctic ice concentration, and crop yields.

Spring 2020 Projects

Coronavirus Live Tracker

As COVID-19 began to spread throughout the world in March 2020, this project created a Coronavirus live tracker that focuses on state testing data to understand how states are impacted differently from one another.

Twitter Word Cloud

This group created a twitter word cloud to visually represent and understand trends in topics that are relevant to users in specific locations.

This group utilized the expansive data on Twitter to analyze trends in relevant topics to users in specific locations around the United States.

Stock Market Analysis: Effects of the Coronavirus

As the coronavirus pandemic has drastically affected the US economy, this project attempts to discover underlying relationships between various sectors of the stock market and coronavirus data within the United States and the global setting.

Student Productivity and Wellbeing

The purpose of this project is to characterize behavior patterns of anonymized UVA undergraduate students including movement, social communication, and activities from Aware data and identify the relationships with their corresponding productivity and wellbeing levels.

Crime Data Analysis

In this study, the team sought to research and analyze crime data in their town of Charlottesville, VA by creating an interactive map of a dataset of crime from Charlottesville Open Data.

Crime Data Analysis

In this study, the team sought to research and analyze crime data in their town of Charlottesville, VA by creating an interactive map of a dataset of crime from Charlottesville Open Data.