These instructions will walk you through installing the required Data Science software stack for the UBC Master of Data Science program. Before starting, ensure that your laptop meets our program requirements:
May 09, 2021 The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Jupyter notebooks are a powerful tool used in education and research. You can write small snippets of Python code and observe the result on screen, combine with paragraphs of text, using Markdown. Carnets provides a complete, stand-alone, implementation of Jupyter notebooks. With Jupyter Notebook integration available in PyCharm, you can easily edit, execute, and debug notebook source code and examine execution outputs including stream data, images, and other media.
Students’ whose laptops do not meet the requirements specified above will not be able to receive technical assistance from the MDS team in troubleshooting installation issues.
If you have already installed Git, Latex, or any of the R or Python related packagesplease uninstall these and follow the instructions below to reinstall them(make sure to also remove any user configuration files and backup them if desired).In order to be able to support you effectivelyand minimize setup issues and software conflicts,we require all students to install the software stack the same way.
An AI Platform Notebooks (JupyterLab) instance is a Deep Learning virtual machine instance with the latest machine learning and data science libraries pre-installed, with the option to include. Build on Mac OS X.¶ This page describes how to build Netgen and NGsolve on Apple computers running OS X 10.9 or greater. These steps will give you a native OSX style framework and a Netgen.app. These steps have been tested on MacOS version 10.12. Here we describe how to build the latest version 6.2.2102.
In all the sections below,if you are presented with the choice to download either a 64-bit (also called x64)or a 32-bit (also called x86) version of the application always choose the 64-bit version.
Once you have completed these installation instructions,make sure to follow the post-installation notes at the endto check that all software is setup correctly.
Please sign up for a UBC Student Email. This account will also grant you access to a range of UBC services, including Microsoft Teams and OneDrive. To do so navigate to https://it.ubc.ca/services/email-voice-internet/ubc-student-email-service and follow the instructions under “Get Started”.
In MDS we will be using many tools that work most reliably on Google Chrome and Firefox (including our online quiz software), so we recommend that you use one of these browsers.
Some MDS courses (e.g. the capstone project) use the LastPass password manager to share credentials. Although we will not cover privacy and security topics until the second semester of the program, we recommend that you use a password manager such as LastPass to help you create strong passwords and store them securely, and to facilitate online authentication. You can sign up for a free LastPass account here: https://lastpass.com/create-account.php. We also recommend installing the LastPass Chrome or Firefox browser extension available here: https://lastpass.com/misc_download2.php.
For our MDS courses and program announcements, correspondence and course forums we use the communication tool Slack. Slack can be accessed via the web browser, however we strongly recommend installing the Slack App. The Slack app can be installed from the Mac App Store, or from the Slack website. Installation instructions from the Slack website install method are here: https://slack.com/intl/en-ca/help/articles/207677868-Download-Slack-for-Mac
Apple recently changed the Mac default shell in the Terminal to Zsh, however, we aim to teach with the same shell across all three operating systems we support, which is the Bash shell. Thus, we ask that you change the default shell in your Terminal to Bash by opening the Terminal (how to video) and typing:
You will have to quit all instances of open Terminals and then restart the Terminal for this to take effect.
The open-source text editor Visual Studio Code (VS Code) is both a powerful text editor and a full-blown Python IDE, which we will use for more complex analysis. You can download and install the macOS version of VS Code from the VS code website https://code.visualstudio.com/download. Once the download is finished, click “Open with Archive utility”, and move the extracted VS Code application from “Downloads” to “Applications”.In addition to reading the getting started instructions, be sure to follow the “Launching from the command line” steps as well.
You can test that VS code is installed and can be opened from Terminal by restarting terminal and typing the following command:
you should see something like this if you were successful:
Note: If you get an error message such as -bash: code: command not found
, but you can see the VS Code application has been installed, then something went wrong with setting up the launch from the command line. Try following these instructions again, in particular you might want to try the described manual method of adding VS Code to your path.
In MDS we will use the publicly available GitHub.com as well as an Enterprise version of GitHub hosted here at UBC, GitHub.ubc.ca. Please follow the set-up instructions for both below.
Sign up for a free account at GitHub.com if you don’t have one already.
To add you to the MDS organization on Github.ubc.ca we need you to login to Github.ubc.ca using your CWL credentials.
This step is required for
We will be using the command line version of Git as well as Git through RStudio and JupyterLab. Some of the Git commands we will use are only available since Git 2.23, so if you’re Git is older than this version, we ask you to update it using the Xcode command line tools (not all of Xcode), which includes Git.
Open Terminal and type the following command to install Xcode command line tools:
After installation, in terminal type the following to ask for the version:
you should see something like this (does not have to be the exact same version) if you were successful:
Note: If you run into trouble, please see that Install Git > Mac OS section from Happy Git and GitHub for the useR for additional help or strategies for Git installation.
Next, we need to configure Git by telling it your name and email. To do this type the following into the terminal (replacing Jane Doe and janedoe@example.com, with your name and email (the same used to sign up for GitHub), respectively):
Note: to ensure that you haven’t made a typo in any of the above, you can view your global Git configurations by either opening the configuration file in a text editor (e.g. via the command code ~/.gitconfig
) or by typing git config --list --global
.
To make programs run from the terminal (such as git
) use vscode by default, we will modify ~/.bash_profile
. First, open it using VS Code:
Note: If you see any existing lines in your ~/.bash_profile
related to a previous Python or R installation,please remove these.
Append the following lines:
Then save the file and exit VS Code.
Most terminal programs will read the EDITOR
environmental variable when determining which editor to use, but some read VISUAL
, so we’re setting both to the same value.
We will be using Python for a large part of the program, and conda
as our Python package manager. To install Python and the conda
package manager, we will use the Miniconda platform (read more here), which Miniconda MacOSX 64-bit pkg install for Python 3.8 can be downloaded here..
After installation, restart the terminal. If the installation was successful, you will see (base)
prepending to your prompt string. To confirm that conda
is working, you can ask it which version was installed:
which should return something like this:
Note: If you see zsh: command not found: conda
, see the section on Bash above to set your default Terminal shell to Bash as opposed to Zsh.
Next, type the following to ask for the version of Python:
which should return something like this:
Note: If instead you see Python 2.7.X
you installed the wrong version. Uninstall the Miniconda you just installed (which usually lives in the /opt
directory), and try the installation again, selecting Python 3.8.
conda
installs Python packages from different online repositories which are called “channels”.A package needs to go through thorough testing before it is included in the default channel,which is good for stability,but also means that new versions will be delayed and fewer packages are available overall.There is a community-driven effort called the conda-forge (read more here),which provides more up to date packagesTo enable us to access the most up to date version of the Python packages we are going to use,we will add the more up to date channel,To add the conda-forge channel by typing the following in the terminal:
To install packages individually, we can now use the following command: conda install <package-name>
. Let’s install the key packages needed for the start of our program:
conda
will show you the packages that will be downloaded,and you can press enter to proceed with the installation.If you want to answer yes
by default and skip this confirmation step,you can replace conda install
with conda install -y
.
Note: we will use many more packages than those listed above across the MDS program, however we will manage these using virtual environments (which you will learn about in DSCI 521: Platforms for Data Science).
We will be using the Jupytext Python package and the JupyterLab git extension to facilitate using Jupyter notebooks with Git & GitHub. Install them via the following commands:
To test that your JupyterLab installation is functional, you can type jupyter lab
into a terminal, which should open a new tab in your default browser with the JupyterLab interface.To exit out of JupyterLab you can click File -> Shutdown
,or go to the terminal from which you launched JupyterLab and hold Ctrl
while pressing c
twice.
R is another programming language that we will be using a lot in the MDS program. We will use R both in Jupyter notebooks and in RStudio.
Go to https://cran.r-project.org/bin/macosx/ and download the latest version of R for Mac (Should look something like this: R-3.6.1.pkg). Open the file and follow the installer instructions.
After installation, in Terminal type the following to ask for the version:
You should see something like this if you were successful:
Note: Although it is possible to install R through conda, we highly recommend not doing so. In case you have already installed R using conda you can remove it by executing conda uninstall r-base
.
Some R packages rely on the dependency XQuartz which no longer ships with the Mac OS, thus we need to install it separately. Download it from here: https://www.xquartz.org/ and follow the installation instructions.
Download the macOS Desktop version of RStudio Preview from https://rstudio.com/products/rstudio/download/preview/. Open the file and follow the installer instructions.
To see if you were successful, try opening RStudio by clicking on its icon (from Finder, Applications or Launchpad). It should open and look something like this picture below:
Next, install the key R packages needed for the start of MDS program,by opening up RStudio andtyping the following into the R console inside RStudio:
Note: we will use many more packages than those listed above across the MDS program, however we will manage these using the renv
package manager (which you will learn about in DSCI 521: Platforms for Data Science).
The IRkernel
package is needed to make R work in Jupyter notebooks. To enable this kernel in the notebooks, install by pasting the following command into the RStudio Console:
Next, open a terminal and type the following(you can’t use RStudio for this stepsince it doesn’t honor $PATH
changes in ~/.bash_profile
)
To see if you were successful, try running JupyterLab and check if you have a working R kernel. To launch the JupyterLab type the following in Terminal:
A browser should have launched and you should see a page that looks like the screenshot below. Now click on “R” notebook (circled in red on the screenshot below) to launch an JupyterLab with an R kernel.
Sometimes a kernel loads, but doesn’t work as expected. To test whether your installation was done correctly now type library(tidyverse)
in the code cell and click on the run button to run the cell. If your R kernel works you should see something like the image below:
To improve the experience of using R in JupyterLab,we will add an extension that allows us to setup keyboard shortcuts for inserting text(thanks to former MDS student Ryan Homer for developing this extension!).By default,it creates shortcuts for inserting two of the most common R operators: <-
and %>%
.Run the following from terminal to install the extension:
To check that the extension is working,open JupyterLab,launch an R notebook,and try inserting the operators by pressing Alt
+ -
or Shift
+ Command
+ m
, respectively.
We will install the lightest possible version of LaTeX and it’s necessary packages as possible so that we can render Jupyter notebooks and R Markdown documents to html and PDF. If you have previously installed LaTeX, please uninstall it before proceeding with these instructions.
First, open RStudio and run the following commands to install the tinytex
package and setup tinytex
:
Note: You might be asked to enter your password during installation.If you see an error message towards the end of the installationtelling you that /usr/local/bin
is not writeable,you will need to open a terminal and run the following two commands before proceeding:
You can check that the installation is working by opening a terminal and asking for the version of latex:
You should see something like this if you were successful:
The above is all we need to have LaTeX work with R Markdown documents, however for Jupyter we need to add several more packages. Do this by opening a terminal and copying the following there press enter:
To test that your latex installation is working with jupyter notebooks,launch jupyter lab
from a terminal and open either a new notebookor the same one you used to test IRkernel above.Go to File -> Export notebook as... -> Export Notebook to PDF
.If the PDF file is created,your LaTeX environment is set up correctly.
We will be using PostgreSQL as our database management system. You can [download PostgreSQL 12.4 from here (do not select version 13). Follow the instructions for the installation. In the password page, type whatever password you want, but make sure you’ll remember it later. For all the other options, use the default. You do not need to run “StackBuilder” at the end of the installation (if you accidentally launch the StackBuilder, click “cancel”, you don’t need to check any boxes).
To test if the installation was successful open the SQL Shell
app from the LaunchPad or applications directory. You will be asked to setup your configuration, accept the default value (the one within square brackets) for the first four values by pressing enter four times, then type in your password and press enter one last time. It should look like this if it is working correctly:
You will use Docker to create reproducible, sharable and shippable computing environments for your analyses. For this you will need a Docker account. You can sign up for a free one here.
After signing-up and signing into the Docker Store, go here: https://store.docker.com/editions/community/docker-ce-desktop-mac and click on the “Get Docker” button on the right hand side of the screen. Then follow the installation instructions on that screen to install the stable version.
To test if Docker is working, after installation open the Docker app by clicking on its icon (from Finder, Applications or Launchpad). Next open Terminal and type the following:
you should see something like this if you were successful:
The real magic of VS Code is in the extensions that let you add languages, debuggers, and tools to your installation to support your specific workflow. Now that we have installed all our other Data Science tools, we can install the VS Code extensions that work really well with them. From within VS Code you can open up the Extension Marketplace (read more here) to browse and install extensions by clicking on the Extensions icon in the Activity Bar indicated in the figure below.
To install an extension, you simply search for it in the search bar, click the extension you want, and then click “Install”. There are extensions available to make almost any workflow or task you are interested in more efficient! Here we are interested in setting up VS Code as a Python IDE. To do this, search for and install the following extensions:
This video tutorial is an excellent introduction to using VS Code in Python.
To improve your experience using bash,we recommend appending a few lines to the end of your bash configuration file.This is optional,but makes it easier to use the TAB key for autocompletionand improves how bash handles the command history(we will talk more about these topics during class).It also adds colors to the terminal’s text,which can make it easier to navigate visually.First,open the configuration file:
Then paste the following at the end of the file(make sure not to overwrite any existing lines)and save it afterwards:
You have completed the installation instructions, well done 🙌!We have created a script to help you check that your installation was successful,and to provide instructions for how you can troubleshoot any potential issues.To run this script,please execute the following command from your terminal.
The output from running the script will look something like this:
As you can see at the end of the output,a log file is saved in your current directory.We might ask you to upload this fileif we need to troubleshoot your installation,so that we can help you more effectively.If any of your packages are marked as “MISSING”you will need to figure out what is wrong and possibly reinstall them.Once all packages are marked as “OK”we will ask you to submit this log file,so that we can confirm that your installation was successful.Details on where to submit will be provided later.
Note that in general you should be careful running scripts unless they come from a trusted source as in this case (just like how you should be careful when downloading and installing programs on your computer).
After a lot of research on the internet, I found no practical tutorial explaining how to embed Jupyter Notebooks in Static Websites using only free technologies. I found a way to do it using Github Gists, MyBinder and NBInteract along with IPython Widgets and I want to share it so no one has to reinvent the wheel.
This is a step by step guide to embed Jupyter Notebooks in a static website. There is also an official tutorial available in the docs.
This guide will cover some points that aren't fully explained in the official tutorial, such as:
I really like using Jupyter Notebooks for my experiments and I also like to show them in this blog, which is a pure static website built with Pelican. The problem emerged when I wanted to embed some interactive content from the notebook in the website, I thought this should have been solved long ago but it wasn't or at least it wasn't as Public as it should be.
Having the possibility to embed Jupyter notebooks in a static website provides lots of advantages, just to mention a few:
In order to do this, I used a set of tools. This is my personal choice but I believed is appropriate, of course, if you have any suggestions, feel free to write a comment below.
The key component here is NBInteract but the other tools were necessary too.
First, I want to clarify my workflow:
I want to write an isolated notebook that the user could run via Binder (1), I want to write a blog post about what I've done in that notebook (2) and in the post, I want to insert specific cells when appropriate (3). Additionally, I want my experiments pieces (the specific cells inserted) to be sharable, meaning any could embed them easily in their website (4) and I also want to track how many times they were used and from which source (5).
These 5 objectives are accomplished in the following way:
The full process could be separated into the following steps:
First thing first, you need a fully working notebook. It should run all cells without errors in your local machine.
In order to take full advantage of this methodology make sure you use widgets (such as IPython's) to give control to the user. Don't rely on manually changing variables inside cells but rather provide a user-friendly interface with buttons and sliders, this will result in a much more pleasant and useful UX/UI for the reader of your blog. Besides, this way, the reader doesn't have to know programming at all!
When you have that done, you have to identify the notebook dependencies. It could be either from Conda, pip, apt or anywhere else.
There are basically four ways you can specify the dependencies for Binder
requirements.txt
: only suitable for pypi-only dependenciesenvironment.yml
: better for Conda dependencies and pypi dependenciesapt.txt
: only option to specify apt dependencies.dockerfile
: specify not only the dependencies but the whole OS.The most common approach to work with Binder is a combination of environment.yml
and apt.txt
. Requirements.txt usually provide very little support for customization and dockerfile is discourage from the Binder developers themselves and should only be used when no other option worked.
A more detailed explanation which many other ways can be found in the official docs
Gists are like small repositories, they have a sort of version control called 'Revisions' but there are no branches, they provide simple updates one after the other. They are especially useful for cases where a repository is too much and just a few files are needed.
Here I will explain how to create a Gist in Github, in case you already know it, you can skip to the next section, where I explain how to set the dependencies for Binder.
In order to create a Gist, you have to first have a Github Account.
Once logged in, you have to click on your profile image in the top bar
Now select 'Your Gists'
The next page will change if you already have gists created but the top bar will remain the same, in the top bar select the '+' icon
Now you are in the Gist creation Template, here you have to write the Gist Name, the filename of the first file, its contents and you can additionally add files if necessary with the button below. When everything is set up, click in create public gist.
Note: You can also create private gists for free but for NBInteract and thus this guide to work, the gist should be public.
You have to create a Github Gist and upload all the files needed, namely:
Once the gist is created, it should be tested with MyBinder
In order to test it first select the Combo Box in the main page
Then select the Gist Option
And finally, write the Username/GISTID
in the textbox and then click on 'Launch'
Note: The first time you launch could take several minutes because it's building a docker image, this process is repeated each time you change the Gist so it is a good idea to always run your gists on Binder after you made a change.
If the build was successful, open the notebook and run all the cells, check for errors, fix them until none appear and then continue to the next step.
This step might seem to be pretty easy but there are cases (especially when doing interesting things) that may require additional settings, for example, setting the FFMPEG dependency correctly for creating animations and videos.
You can check some of my personal examples of Gists:
With this step completed, you can add a Binder Badge in your post directly to the notebook. Although the main objective is to embed specific parts, it doesn't imply the user wouldn't want access to the whole notebook in a more familiar environment such as the one Binder Provides.
Example of a binder Badge (it leads to the a Notebook about Ordinary Differential Equations):
This is the most important part and where NBInteract comes in. For all the previous steps we used online tools such as Github Gists and MyBinder. But for this step, nbinteract
should be install.
These are the installation instructions:
The steps for the installation could be found in its official repo, make sure to check it in case there is an update.
The nbinteract is a command line application (CLI), so first navigate to the directory where the notebook file is and then run:
This will create an HTML file with the same filename of the notebook, but as mentioned earlier, the current version of NBInteract (0.2.4 when writing this) only supports Github Repositories and not Gists.
To fix this problem, just open the HTML file and look for this snippet (it should be at the end of the file):
Only a small change is needed, change the provider
from gh
to gist
. It should be something like this:
Now the file is already prepared to be embedded as an HTML IFrame, but although that's technically true, we might want to make other adjustments:
The first adjustment is for improving SEO and compatibility, it will depend on every particular case as each notebook may need its own little changes. The W3C has an online validator to check for Markup errors and warnings
The second adjustment is needed since nbinteract inserts Bootstrap and Font Awesome by default in the HTML, you can check it because the style
tag covers approximately a 95% of the file. With custom styles and a careful selection of CSS classes, one can reduce the size pretty significantly. In my own experience, I manage to reduce from 14000 lines to 400 (318KB to 25KB), this is a reduction of the 90%!
Of course it may vary from user to user but since file size is crucial for a good user experience, I recommend to tweak and adapt the file to be as little as possible, and remember to use CSS in style tags and not relying on the website general CSS, this would be the only way to provide the same look and feel when the file is used as an IFrame. This way each file will be completely independent of each other.
The third adjustment is more of a hack actually. NBInteract is designed so that the user would use the notebook as the entire page but if the notebook contains just some cells (each with the proper imports), one can create minimal notebooks and insert them individually in a larger pure HTML page
Just use nbinteract with a notebook file containing only the cells you are interested in. This will naturally lead to having as many .ipynb files as a group of cells you want to embed, each with an associated HTML file. It isn't a good idea to have everything in a big Notebook but neither is to have dozens of little notebooks. I believe a reasonable amount would be between 5 and 10.
The fourth and last adjustment will depend on the tracking system of your choice but generally, it implies adding some script tag at the end of the file with a specific tracking ID. But DO NOT add the system yet! Let's test it and then add it to avoid meaningless analytics (unless you've added your IP as an exception in the Tracking System).
Lastly, you should embed the IFrame in your website and test it, here is where you should modify the styles according to the general aesthetics of the page and check if all works as desired. Make sure the IFrame size is correct for its content (avoid scrollbars)
You can also test whether you want the input cells to be visible or only let the output to be rendered.
After everything is tested, you can now add the tracking ID for the Analytics service (Google Analytics or the one of your choice)
To make your IFrame easier to share, add an HTML snippet to specify how to embed your IFrame in other pages, see the examples below.
You can find some examples in the official docs but I've also built some that you may find interesting. You can check the following posts to see them:
Use NBInteract to numerically integrate several types of differential equations (including systems of ODEs) and change the parameters with sliders and see the results live.
If you want to embed this widget in your website, just add the following HTML:
Read the Full Article
Use NBInteract to produce a video with matplotlib animation and FFmpeg. The user can create an image and a video with the set of parameters of their choice and then save them.
If you want to embed this widget in your website, just add the following HTML:
Read the Full Article
In case you want to go deeper, here are some useful resources:
Other resources to achieve similar things are: