Data Carpentry for Oceanographers

Using python to accelerate research and increase reproducibility

Syllabus

This workshop is based on lessons developed by the Carpentries (See https://carpentries.org for more information about the Carpentries organisation.) The episode order in the schedule below is rearranged below to improve flow and content for the oceanographer audience.

The original source lessons used for this Carpentry are:

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What are basic principles for using spreadsheets for good data organization?
00:18 2. Formatting data tables in Spreadsheets How do we format data in spreadsheets for effective data use?
00:53 3. Formatting problems What are some common challenges with formatting data in spreadsheets and how can we avoid them?
01:13 4. Dates as data What are good approaches for handling dates in spreadsheets?
01:26 5. Quality control How can we carry out basic quality control and quality assurance in spreadsheets?
01:46 6. Exporting data How can we export data from spreadsheets in a way that is useful for downstream applications?
01:56 7. Before we start What is Python and why should I learn it?
02:26 8. Short Introduction to Programming in Python What is Python?
Why should I learn Python?
02:56 9. Starting With Data How can I import data in Python?
What is Pandas?
Why should I use Pandas to work with data?
03:56 10. Data Types and Formats What types of data can be contained in a DataFrame?
Why is the data type important?
04:41 11. Indexing, Slicing and Subsetting DataFrames in Python How can I access specific data within my data set?
How can Python and Pandas help me to analyse my data?
05:41 12. Combining DataFrames with Pandas Can I work with data from multiple sources?
How can I combine data from different data sets?
06:26 13. Software installation using conda How do I install and manage all the Python libraries that I want to use?
How do I interact with Python?
06:56 14. Data Ingest and Visualization - Matplotlib and Pandas What other tools can I use to create plots apart from ggplot?
Why should I use Python to create plots?
08:41 15. Making Plots With plotnine How can I visualize data in Python?
What is ‘grammar of graphics’?
10:11 16. Refresher
10:41 17. Data Workflows and Automation Can I automate operations in Python?
What are functions and why should I use them?
12:11 18. Introduction to netCDF What is NetCDF format?
Why using Xarray for NetCDF files in Python
12:26 19. Visualising CMIP data How can I create a quick plot of my CMIP data?
13:26 20. Refresher
13:36 21. Functions How can I define my own functions?
14:16 22. Vectorisation How can I avoid looping over each element of large data arrays?
14:46 23. Command line programs How can I write my own command line programs?
15:36 24. Version control How can I record the revision history of my code?
16:11 25. GitHub How can I make my code available on GitHub?
16:36 26. Defensive programming How can I make my programs more reliable?
17:06 27. Data provenance How can keep track of my data processing steps?
17:36 28. Accessing SQLite Databases Using Python and Pandas What if my data are stored in an SQL database? Can I manage them with Python?
How can I write data from Python to be used with SQL?
18:21 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.