Geospatial python tutorial - PyQGIS in Jupyter Notebook

Usually, I did a lot of my PyQGIS scripting in QGIS python console and also PyCharm for plugins development. But I wanted something different this time, so tried to use PyQGIS in Jupyter notebook. I have been facing some issues to run PyQGIS inside Jupyter Notebook before. Since I already resolved that issues, I am happy to share it and hopefully it is really useful for your awesome projects.

It is fun to do some experiment on this and I hope you guys can build something wonderful in Jupyter using QGIS libraries and module. Well, at least that is my intention.

Before we start writing our script in Jupyter Notebook , we need to setup our environment by creating a *.cmd or *.bat file.

Setup

Setting the Environment

Before we begin our scripting in Jupyter, you might want to copy and paste batch script below and save it as *.cmd or *.bat. Make sure you check the directory, edit the path or directory wherever possible. At the bottom line there is a line called call jupyter notebook, you can change it to cmd.exe if you got something in your mind and need to use command line.

Notes ! : Please check all the directories, make sure it is available. I am using Windows, so it could be different in other OS system. I am also using OSGEO4W for my QGIS installation, so please use your QGIS installation directory.

@echo off
call "%~dp0\o4w_env.bat"
call "%OSGEO4W_ROOT%\apps\grass\grass76\etc\env.bat"
call qt5_env.bat
call py3_env.bat
@echo off
path %OSGEO4W_ROOT%\apps\qgis\bin;%OSGEO4W_ROOT%\apps\grass\grass76\lib;%OSGEO4W_ROOT%\apps\grass\grass76\bin;%PATH%
PATH %OSGEO4W_ROOT%\apps\qgis\python;%PATH%
PATH %OSGEO4W_ROOT%\apps\qgis\bin;%PATH%
PATH %OSGEO4W_ROOT%\apps\Qt5\bin;%PATH%
PATH %OSGEO4W_ROOT%\apps\Python37;%PATH%
PATH %OSGEO4W_ROOT%\apps\Python37\Scripts;%PATH%
PATH %OSGEO4W_ROOT%\apps\qgis\python\plugins;%PATH%
PATH %OSGEO4W_ROOT%\apps\Python37\DLLs;%PATH%
PATH %OSGEO4W_ROOT%\apps\Python37\lib;%PATH%
PATH %OSGEO4W_ROOT%\apps\Python37\lib\site\packages;%PATH%

set QGIS_PREFIX_PATH=%OSGEO4W_ROOT:\=/%/apps/qgis
set GDAL_FILENAME_IS_UTF8=YES
set VSI_CACHE=TRUE
set VSI_CACHE_SIZE=1000000
set QT_PLUGIN_PATH=%OSGEO4W_ROOT%\apps\qgis\qtplugins;%OSGEO4W_ROOT%\apps\qt5\plugins
set PYTHONPATH=%OSGEO4W_ROOT%\apps\qgis\python

@echo off
call jupyter notebook

Save this file somewhere in you computer, or just save it in the E:\OSGeo4W64\bin so you can run it from OSGEO4W Shell. Name it as run-jupyter.bat so whenever we open OSGeo4W Shell just type run-jupyter.bat

Testing our script in Jupyter Notebook!

Try to write something as below, if there are no error messages, then you are doing great.

from qgis.core import *

So now, lets try to do something simple, like extracting vertices from vector data.

import sys
import os
from osgeo import ogr
from qgis.core import (
     QgsApplication,
     QgsProcessingFeedback,
     QgsVectorLayer
)
from qgis.analysis import QgsNativeAlgorithms

We need python to set the QGIS prefix, and access it from E:\OSGeo4W64\apps\qgis. After we initiate QGIS in python system, append the plugins libraries by accessing this folder, E:\OSGeo4W64\apps\qgis\python\plugins.

QgsApplication.setPrefixPath(r'E:\OSGeo4W64\apps\qgis', True)
qgs = QgsApplication([], False)
qgs.initQgis()
sys.path.append(r'E:\OSGeo4W64\apps\qgis\python\plugins')

We need to know all the available processing module within QGIS, so follow as bellow.

import geopandas as gpd
import matplotlib.pyplot as plt
plt.ion()

import processing
from processing.core.Processing import Processing
Processing.initialize()
QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms())
for alg in QgsApplication.processingRegistry().algorithms():
        print(alg.id(), "--->", alg.displayName())

source = r"E:\Program_dev\data\test.geojson"
inputvector = QgsVectorLayer(r"E:\Program_dev\data\test.geojson")
fig, ax = plt.subplots(figsize = (10,10))
gpd.read_file(source).plot(ax=ax);

Extract Vertices

Before we start, lets check the parameter need to run this algorithm by following below scripts

processing.algorithmHelp("native:extractvertices")

Extract vertices (native:extractvertices)

This algorithm takes a line or polygon layer and generates a point layer with points representing the vertices in the input lines or polygons. The attributes associated to each point are the same ones associated to the line or polygon that the point belongs to.

Additional fields are added to the point indicating the vertex index (beginning at 0), the vertex’s part and its index within the part (as well as its ring for polygons), distance along original geometry and bisector angle of vertex for original geometry.


----------------
Input parameters
----------------

INPUT: Input layer

    Parameter type:    QgsProcessingParameterFeatureSource

    Accepted data types:
        - str: layer ID
        - str: layer name
        - str: layer source
        - QgsProcessingFeatureSourceDefinition
        - QgsProperty
        - QgsVectorLayer

OUTPUT: Vertices

    Parameter type:    QgsProcessingParameterFeatureSink

    Accepted data types:
        - str: destination vector file, e.g. 'd:/test.shp'
        - str: 'memory:' to store result in temporary memory layer
        - str: using vector provider ID prefix and destination URI, e.g. 'postgres:…' to store result in PostGIS table
        - QgsProcessingOutputLayerDefinition
        - QgsProperty

----------------
Outputs
----------------

OUTPUT:  <QgsProcessingOutputVectorLayer>
    Vertices

vertice_out = r"E:\Program_dev\data\out_vertices.geojson"
# just delete the output if exist
if os.path.exists(vertice_out):
    os.remove(vertice_out)
else:
    pass
params = {
    'INPUT': inputvector,
    'OUTPUT': vertice_out
}
feedback = QgsProcessingFeedback()
processing.run("native:extractvertices", params, feedback=feedback)
fig, ax = plt.subplots(figsize = (10,10))
gpd.read_file(vertice_out).plot(ax=ax);

Buffer

processing.algorithmHelp("native:buffer")

Buffer (native:buffer)

This algorithm computes a buffer area for all the features in an input layer, using a fixed or dynamic distance.

The segments parameter controls the number of line segments to use to approximate a quarter circle when creating rounded offsets.

The end cap style parameter controls how line endings are handled in the buffer.

The join style parameter specifies whether round, miter or beveled joins should be used when offsetting corners in a line.

The miter limit parameter is only applicable for miter join styles, and controls the maximum distance from the offset curve to use when creating a mitered join.


----------------
Input parameters
----------------

INPUT: Input layer

    Parameter type:    QgsProcessingParameterFeatureSource

    Accepted data types:
        - str: layer ID
        - str: layer name
        - str: layer source
        - QgsProcessingFeatureSourceDefinition
        - QgsProperty
        - QgsVectorLayer

DISTANCE: Distance

    Parameter type:    QgsProcessingParameterDistance

    Accepted data types:
        - int
        - float
        - QgsProperty

SEGMENTS: Segments

    Parameter type:    QgsProcessingParameterNumber

    Accepted data types:
        - int
        - float
        - QgsProperty

END_CAP_STYLE: End cap style

    Parameter type:    QgsProcessingParameterEnum

    Available values:
        - 0: Round
        - 1: Flat
        - 2: Square

    Accepted data types:
        - int
        - str: as string representation of int, e.g. '1'
        - QgsProperty

JOIN_STYLE: Join style

    Parameter type:    QgsProcessingParameterEnum

    Available values:
        - 0: Round
        - 1: Miter
        - 2: Bevel

    Accepted data types:
        - int
        - str: as string representation of int, e.g. '1'
        - QgsProperty

MITER_LIMIT: Miter limit

    Parameter type:    QgsProcessingParameterNumber

    Accepted data types:
        - int
        - float
        - QgsProperty

DISSOLVE: Dissolve result

    Parameter type:    QgsProcessingParameterBoolean

    Accepted data types:
        - bool
        - int
        - str
        - QgsProperty

OUTPUT: Buffered

    Parameter type:    QgsProcessingParameterFeatureSink

    Accepted data types:
        - str: destination vector file, e.g. 'd:/test.shp'
        - str: 'memory:' to store result in temporary memory layer
        - str: using vector provider ID prefix and destination URI, e.g. 'postgres:…' to store result in PostGIS table
        - QgsProcessingOutputLayerDefinition
        - QgsProperty

----------------
Outputs
----------------

OUTPUT:  <QgsProcessingOutputVectorLayer>
    Buffered

buf_out = r"E:\Program_dev\data\buffer_output.geojson"
# just delete the output if exist
if os.path.exists(buf_out):
    os.remove(buf_out)
else:
    pass
params = {
    'INPUT': inputvector,
    'DISTANCE': 50,
    'SEGMENTS': 5,
    'END_CAP_STYLE':0,
    'JOIN_STYLE':0,
    'MITER_LIMIT':2,
    'DISSOLVE': False,
    'OUTPUT': buf_out
}
feedback = QgsProcessingFeedback()
processing.run("native:buffer", params, feedback=feedback)
# plot the data using geopandas .plot() method
fig, ax = plt.subplots(figsize = (10,10))
gpd.read_file(buf_out).plot(ax=ax);

Random points

processing.algorithmHelp("qgis:randompointsinsidepolygons")

Random points inside polygons (qgis:randompointsinsidepolygons)


----------------
Input parameters
----------------

INPUT: Input layer

    Parameter type:    QgsProcessingParameterFeatureSource

    Accepted data types:
        - str: layer ID
        - str: layer name
        - str: layer source
        - QgsProcessingFeatureSourceDefinition
        - QgsProperty
        - QgsVectorLayer

STRATEGY: Sampling strategy

    Parameter type:    QgsProcessingParameterEnum

    Available values:
        - 0: Points count
        - 1: Points density

    Accepted data types:
        - int
        - str: as string representation of int, e.g. '1'
        - QgsProperty

EXPRESSION: Expression

    Parameter type:    QgsProcessingParameterExpression

    Accepted data types:
        - str
        - QgsProperty

MIN_DISTANCE: Minimum distance between points

    Parameter type:    QgsProcessingParameterDistance

    Accepted data types:
        - int
        - float
        - QgsProperty

OUTPUT: Random points

    Parameter type:    QgsProcessingParameterFeatureSink

    Accepted data types:
        - str: destination vector file, e.g. 'd:/test.shp'
        - str: 'memory:' to store result in temporary memory layer
        - str: using vector provider ID prefix and destination URI, e.g. 'postgres:…' to store result in PostGIS table
        - QgsProcessingOutputLayerDefinition
        - QgsProperty

----------------
Outputs
----------------

OUTPUT:  <QgsProcessingOutputVectorLayer>
    Random points

random_out = r"E:\Program_dev\data\random_output.geojson"
# just delete the output if exist
if os.path.exists(buf_out):
    os.remove(buf_out)
else:
    pass
params = {
    'INPUT': inputvector,
    'STRATEGY': 0,
    'EXPRESSION': 20,
    'MIN_DISTANCE':1,
    'OUTPUT': random_out
}
feedback = QgsProcessingFeedback()
processing.run("qgis:randompointsinsidepolygons", params, feedback=feedback)
fig, ax = plt.subplots(figsize = (10,10))
gpd.read_file(random_out).plot(ax=ax);