The UBC Scientific Software Seminar is inspired by Software Carpentry and its goal is to help students, graduates, fellows and faculty at UBC develop software skills for science.
- What are the learning goals?
- To practice data wrangling using pandas
- To construct data visualizations using matplotlib and bokeh
- To build models and make predictions using scikit-learn
- To submit models and solutions to Kaggle competitions
- To meet and collaborate with other students and faculty interested in scientific computing
- What software tools are we going to use?
- Kaggle: datasets, competitions, and community
- bokeh: interactive data visualizations
- scikit-learn: machine learning in Python
- SciPy Stack: scientific computing with NumPy, SciPy, matplotlib and pandas
- Python and Jupyter Notebooks
- ubc.syzygy.ca: Jupyter notebooks hosted by PIMS
- What scientific topics will we study?
- Data wrangling
- Data visualization
- Machine learning
- Where do we start? What are the prerequisites?
- Calculus, linear algebra, probability and statistics
- Basic Python (see UBCS3 Summer 2016)
- Who is the target audience?
- Everyone is invited!
- If the outline above is at your level, perfect! Get ready to write a lot of code!
- If the outline above seems too intimidating, come anyway! You'll learn things just by being exposed to new tools and ideas, and meeting new people!
- If you have experience with all the topics outlined above, come anyway! You'll become more of an expert by participating as a helper/instructor!
Please join the mailing list to receive weekly updates about the seminar.
- Week 1 - Friday September 29 - 1-2pm - DLAM Learning Lab - [Notes]
- Introduction to Kaggle
- Competitions, datasets, kernels, and community
- Getting Started
- Titanic: Machine Learning from Disaster
- Make a submission using a Decision Tree Classifier
- Introduction to Kaggle
- Week 2 - Friday October 6 - 1-2pm - DLAM Learning Lab - [Notes]
- Feature engineering on Titanic dataset
- Titles and decks
- Filling missing data
- Random forest classifiers
- Feature engineering on Titanic dataset
- Week 3 - Friday October 13 - 1-2pm - DLAM Learning Lab - [Notes]
- Tunig parameters for random forest classifier
- Our best attempt
- Week 4 - Friday October 20 - 1-2pm - DLAM Learning Lab - [Notes]
- NYC Taxi Trip Duration
- Outliers and clusters
- NYC Taxi Trip Duration
- Week 5 - Friday October 27 - 1-2pm - DLAM Learning Lab - [Notes]
- NYC Taxi Trip Duration
- Distance and spatial features for a random forest resgressor
- NYC Taxi Trip Duration
- Week 6 - Friday November 3 - 1-2pm - DLAM Learning Lab - [Notes]
- NYC Taxi Trip Duration
- A random forest resgressor for every route
- NYC Taxi Trip Duration
- Week 7 - Friday November 10 - No meeting
- Week 8 - Friday November 17 - 1-2pm - DLAM Learning Lab - [Notes] (presented by @sempwn)
- West Nile Virus Prediction
- Data exploration
- West Nile Virus Prediction
- Week 9 - Friday November 24 - 1-2pm - DLAM Learning Lab - [Notes] (presented by @sempwn)
- West Nile Virus Prediction
- Models and predictions
- West Nile Virus Prediction
- Week 10 - Friday December 1 - 1-2pm - DLAM Learning Lab
- Quora Question Pairs
- Natural language processing with NLTK
- Quora Question Pairs