Strip Jupyter Notebook Output
Jupyter notebooks without multimedia outputs are more friendly to source control since git is not good at comparing binary data (e.g., plots, pictures, videos) in jupyter notebooks. And they tend to bloat the size of git repositories.
nbconvert¶
You can use nbconvert to remove the output cells of Jupyter notebooks.
Git automation¶
YOu can use Git automation to strip the output automatically on git commit. The following git filter settings keep full notebooks as-is but commit the "clean" version.
In your project folder's .git/config:
.git/config
[filter "strip-notebook-output"]
    clean = "jupyter nbconvert --clear-output --inplace --stdin --stdout --log-level=ERROR"
And in your project folder's .gitattributes:
How this works:1
- The attributetells git to run the filter's clean action on each notebook file before adding it to the index (staging).
- The filter is our friend nbconvert, set up to read from stdin, write to stdout, strip the output, and only speak when it has something important to say.
- When a file is extracted from the index, the filter's smudge action is run, but this is a no-op as we did not specify it. You could run your notebook here to re-create the output (nbconvert --execute --inplace).
- Note that if the filter somehow fails, the file will be staged unconverted.
nbstripout¶
kynan/nbstripout can automatically setting up nbconvert and git filter.