Refresher

Overview

Teaching: 30 min
Exercises: 0 min
Questions
Objectives

Takeaways

Spreadsheets

Python

One can assign a value to a variable in Python. Those variables can be of several types, such as string, integer, floating point and complex numbers.

Python Data structures:

Python uses 0-based indexing, in which the first element in a list, tuple or any other data structure has an index of 0.

Libraries enable us to extend the functionality of Python.

Pandas

One of the best options for working with tabular data in Python is the library Pandas (Python Data Analysis Library).

Pandas provides an object called DataFrame, this object represents tabular data. Dataframes are a 2-dimensional data structure and can store data of different types (including characters, integers, floating point values, factors and more) in columns.

Aggregating data using the groupby() function enables you to generate useful summaries of data quickly.

Dataframes can be subsetted in different ways including using labels (column headings), numeric ranges, or specific x,y index locations.

Data from multiple files can be combined into a single DataFrame using merge and concat.

Plotting

Matplotlib is a Python package that is widely used throughout the scientific Python community to create high-quality and publication-ready graphics.

Useful resources:

The plotnine package is built on top of Matplotlib and interacts well with Pandas, it supports the creation of complex plots from data in a dataframe. Plotnine graphics are built step by step by adding new elements adding different elements on top of each other using the + operator. Putting the individual steps together in brackets () provides Python-compatible syntax.

Setting up your Anaconda environment

Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer. There are different version of the python language, but we use the latast one in this workshop Python version 3.x (e.g., 3.6 is fine).
The new MacOS “Catalina” proved to be difficult to install anaconda. These are links that users found useful in troubleshooting installing Anaconda with Mac Catalina:

Anaconda Navigator vs Conda

Once Anaconda is installed, interacting with the program can either happen using the user interface “Anaconda Navigator” or the command-line program “conda”:

Anaconda Environment

A conda environment is a directory that contains a specific collection of conda packages that you have installed. In order to run, many scientific packages depend on specific versions of other packages. Data scientists often use multiple versions of many packages and use multiple environments to separate these different versions. When an environment is not specified from the beginning and you isntall a package, it gets installend in the base environment.

Explore your conda set-up (command line instructions):

Installing Packages

Note: Packages get installed in the environment that at that moment is active. Activate the required environment first before installing a package.

Some packages that you use in python are getting installend when anaconda, and can directly be imported in a script or notebook. Some packages can not be imported and need to be installed first.

The command above looks for the package on a default location where python packages get uploaded. However, sometimes packages are stored on a different location. The location where Conda searches for the package is called a channel. The pacage “plotnine” for example is not available on the default location and therefore need to have the channel specified where Conda should look for the package: conda install -c conda-forge plotnine

Jupyter Notebook

To correctly open jupyter notebook:

Check installs

Check if your installs are running by importing the libraries in a jupyter notebook.
List all packages and versions installed in active environment: conda list

Where should your data be?

Desktop

Key Points