Oral exam

Suggested dates - any conflicts?
- Tuesday June 2 and Thursday June 4, morning and afternoon
- Each half-day has a limit of 10 students, total duration approximately 2.5 hours, the approximate schedule will be known
- Sign up in AIS from Thursday April 30 18:00
Upon arrival in the computer room, open your group’s project on the school computer (Windows or Linux), ensure you can run it, and make a copy that you will modify. Turn off AI assistants.
We will discuss your project, and you will be given a task to improve or change something in it.
You can search in lectures and tasks, in online documentation or existing discussions on the internet, you can also bring paper materials.
It is forbidden to communicate with other people and use artificial intelligence tools, as well as other hardware except the school computer (as a backup for the case of major technical problems, we recommend bringing your own laptop).
Used resources should remain open in the browser.
We may also ask you about your homeworks or group tasks.
The weight of the exam is 15%, but you must at least achieve half of the points, otherwise Fx. Based on the oral exam, we may also change your points from the project/tasks, if we find out that your contribution to the group work was not sufficient or that you do not understand the submitted work.
Remedial exams will be scheduled later, after discussion with those who need them. There will be at most two additional dates.

Final test

The only regular test date Friday June 5, 9:30. Some conflicts with other exams?
- Remedial exams will be scheduled later, after discussion with those who need them. There will be at most two additional dates.
Bring your pens/pencils and ISIC.
It is not allowed: any papers, electronic devices, communication with other people except the instructors
Write in the space reserved for each question, request additional paper if needed
Write clearly, illegible answers will be graded with 0 points
Test duration 60-90 minutes (will be announced at the beginning of the test)
Questions will be in Slovak, some terms will be given also in English for clarity
- You can request questions in English before may 22 10pm using this form: https://forms.office.com/e/SV3A4xth2E
- You can write your answers in Slovak or English
To pass this course, you must score at least 50% of the points from the test
The list below contains the terms you should know (for each term: definition, if given, intuitive meaning, advantages/disadvantages etc.)
There is also a list of commands from the pandas/matplotlib/seaborn libraries and their parameters that you should know
If we use other commands from these libraries, they will be explained in the question text
Examples of question types:
- simple questions on knowledge/comprehension (similar to some quiz questions)
- what would this code output for this input?
- how to complete the code so that it does xyz? (for example adding a few commands, not writing a whole longer code)
- discuss the plot with regard to some aspects covered in the course
- propose how you would visualize a certain type of data

Test syllabus

L01b Matplotlib

figure,axes = plt.subplots(nrows, ncols, sharex, sharey)
axes.plot(x, y, fmt, label) (fmt options '.', '-', '.-')

L02 Pandas

df = pd.DataFrame({'col1_name':col1_data,...})
df.iloc[1,2]
df.iloc[[0, 2, 3], 0:2]
df.iloc[[True, False, True, True], :)
df.sort_values(column, inplace)
df.copy()
df.set_index()
df.reset_index()
df.loc[]
series1 + series2, series + number, similarly -, /, *, > etc.
df.query(...)
wide vs. long table

L03 Plot types, matplotlib, seaborn

Types of variables: categorical / qualitative vs numerical / quantitative; nominal, ordinal, discrete, continuous, ratio, interval
Different plot types (what types variables to use it for, advantages, disadvantages)
- scatterplot (additional variable as color, size, shape; use of log axes)
- line graph
- area graph
- small multiples
- bar graph (horizontal/vertical, additional variable as color, stacked)
- dot plot
- heatmap
- pie chart
- strip plot
- histogram
- parallel coordinates
- parallel categories
- radar chart
sns.scatterplot(data, x, y, hue, size, col)
sns.barplot(data, x, y, hue)

L04 Summary statistic

For each statistic: definition, intuive meaning, properties
Measures of central tendency: mean, median, mode
Quantiles, percentiles and quartiles
Measures of variability: minimum, maximum, interquartile range, variance and standard deviation
Tukey’s definition of outliers using IQR
Boxplot
Summary statistics under linear transformation of the variable
Pearson correlation coefficient
Spearman’s rank correlation coeﬀicient
Correlation does not imply causation
Computation in Pandas:
- series.mean(), series.median(), series.mode()
- series.quantile([p0,...,pn]), linear interpolation
- series.min(), series.max(), series.std()

L05 Pandas 2

pd.merge(df1, df2, on) (with default inner join)
split-apply-combine strategy, apply as aggregation, transformation, filtering
df.groupby(column)[column_selection].aggregation_function() with aggregation functions size(), count(), sum(), mean(), median(), min(), max(), describe()
df.groupby(column)[column_selection].transform(function)
df.groupby(column)[column_selection].filter(function)
Missing values as np.nan

L06 Maps, graphs, time series

Map projections: conformal / equal-area
Thematic maps
Data as points or lines in a map
Isarithmic maps / isoline maps / heatmaps
Choropleth maps, spatially extensive/intensive variables
Graph terminology (vertices, edges, tree), basics of graph drawing
Time series: smoothing with aggregation and sliding window, overlapping timescales, uncertainty and missing values

L07 More statistics

Histograms (which properties of data they show, choice of bins, comparing several distributions)
Probability density function (and its relation to histograms)
Kernel density estimation (how it is constructed, how it is used, bandwidth)
Violin plot
Two-dimensional histograms / KDE
Cumulative distribution function (definition, properties)
Empirical cumulative distribution function (definition, properties, use for visualization)
Clustering (intuitive meaning, use for improving heatmaps)
Dimensionality reduction (intuitive meaning, use for visualizing high-dimensional data)

L08 Visual perception and colors

Light as a mixture of wavelengths
Human eye: retina, photoreceptors, cones and rods, cones with three wavelengths
Foveal vs peripheral vision
Metamers, LMS color space
Additive color models, RGB, HSL, HSV
Subtractive color models, CMY(K)
Color wheel, RYB model, primary and secondary colors, complementary color scheme
Issues to consider in visualization (color blindess, technical limitations, use for highlighting)
Palettes in visualization: qualitative, quantitative sequential, quantitative diverging
Raster vs vector image formats
Data analysis project phases, exploratory vs explanatory visualization

L09 Text, visual perception (2)

Text visualization: word clouds vs other techniques for showing word frequencies
Pre-attentive attributes and their use in visualization
Hierarchy of graph elements for quantitative reasoning
Gestalt principles of proximity, similarity, connection, enclosure, closure, continuity and their use in visualization
Illusions
Working memory
Chart junk

L10 Presentation of results

Context of a presentation
Storytelling in a presentation
Cognitive biases, patternicity bias, storytelling bias, conformation bias
Aspects of visualization: basic setup, data transformations and other settings, focus and explatation
Table vs plot
Interactivity, which aspects of a plot can be interactive
Dashboard (what it is)

L11 Interactivity, other types of plots

Infographics vs data visualization
Other types of graphs: waterfall chart, funnel chart, Gannt chart, candlestick chart