Plugins¶

Indigo uses plugins to allow specific functionality to be customised for different countries, localities and languages. For example, extracting a Table of Contents and automatically linking references both use the plugin system. This means they can be adjusted to suit different languages and references styles.

Locales¶

Most plugins are locale-aware. That is, Indigo looks for a plugin implementation that best matches the locale of a work or a document.

The locale is a tuple of strings (country, language, locality), such as ('za', 'eng', None), where None is a wildcard that will match anything. The tuple describes the locales to which the plugin applies. In this example, the plugin applies to any work in South Africa (za) with an English (eng) expression, and will match on any locality within South Africa (the last None item).

If there are multiple plugins with locales that match a document or work, Indigo will use the one that most specifically matches it (i.e. has the fewest wildcards.)

Plugin registry¶

Plugins register themselves with the plugin registry for a certain topic. The following plugin topics are understood by Indigo:

importer plugins import text from documents and mark them up with Akoma Ntoso. Usually extend indigo_api.importers.base.Importer.
publications plugins provide publication documents for works. Usually extend indigo.analysis.publications.base.BasePublicationFinder.
refs plugins automatically identify and markup references between works in the text of a document. Usually extend indigo.analysis.refs.base.BaseRefsFinder.
terms plugins automatically identify and markup defined terms in document markup. Usually extend indigo.analysis.terms.base.BaseTermsFinder.
toc plugins return a Table of Contents from document markup. Usually extend indigo.analysis.toc.base.TOCBuilderBase.
work-detail plugins return tradition-specific information for a work, such as numbered titles. Usually extend indigo.analysis.work_detail.base.BaseWorkDetail.

Register a plugin using plugins.register(topic) and include a locale that describes which locales your plugin is specific to:

from indigo.analysis.work_detail.base import BaseWorkDetail
from indigo.plugins import plugins


@plugins.register('work-detail')
class CustomisedWorkDetail(BaseWorkDetail):
    locale = ('za', 'afr', None)
    ...

Fetching a plugin¶

You can fetch a plugin for a work or a document using for_work(), for_document(), or for_locale() on the plugin registry, giving it a plugin topic and a work, document or locale:

from indigo.plugins import plugins

toc_builder = plugins.for_document('toc', document)
if toc_builder:
    toc_builder.table_of_contents_for_document(document)

Custom tasks¶

You can also create custom tasks using the plugin system. Custom tasks can provide specific URLs for performing the task, control who can close a task, etc.

Indigo recognises a custom task using the Task.code attribute on the task. This is an arbitrary string value which you provide when you register your custom task with the registry.

Like plugins, tasks are also locale-specific so you can provide locale-dependent implementations. More than one custom task can be registered for the same task code. Indigo will use the implementation with the closest locale match.

Register your task with the task system like this:

from indigo.custom_tasks import CustomTask, tasks

@tasks.register('my-custom-code')
class MyCustomTask(CustomTask):
    locale = (None, None, None)

    def setup(self, task):
        self.task = task

When Indigo sees a task with a task code attribute, it will lookup the custom task from the registry, create an instance, and call setup(task) with the task instance.

Loading plugins and custom tasks¶

It’s common to place your plugins in plugins.py and custom tasks in custom_tasks.py in your project directory. Then load those files in your Django apps.py when Django calls your app’s ready() method:

from django.apps import AppConfig


class MyAppConfig(AppConfig):
    name = 'my_app'

    def ready(self):
        # ensure our plugins are pulled in
        import my_app.plugins
        import my_app.custom_tasks

Plugin API reference¶

class indigo_api.importers.base.Importer¶

Imports documents and parses text into Akoma Ntoso using pipelines.

analyse_after_import(doc)¶: Run analysis after first import.

cropbox = None¶: Crop box to import within, as [left, top, width, height]

doctype_pipeline_class¶

Which doctype pipeline class should be used?

alias of DoctypePipeline

fragment = None¶: The name of the AKN element that we’re importing, or None for a full act.

fragment_id_prefix = None¶: The prefix for all ids generated for this fragment

import_from_upload(upload, doc, request)¶

Import an uploaded document into an Akoma Ntoso XML document.

The upload is an django.core.files.uploadedfile.UploadedFile instance.

import_upload_with_context(upload, doc, context)¶: Apply a pipeline with context to import the uploaded file.

locale = (None, None, None)¶: Locale for this analyzer, as a tuple: (country, language, locality). None matches anything.

page_nums = None¶

Pages to import for document types that support it, or None to import them all.

This can either be a string, such as “1,5,7-11” or it can be a list of integers and (first, last) tuples.

parse_from_text(text, frbr_uri)¶: Parse text into Akoma Ntoso.

section_number_position = 'before-title'¶: By default, where do section numbers usually lie in relation to their title? One of: before-title, after-title or guess.

stash_attachment(upload, doc)¶: Add an UploadedFile instance as an attachment.

stash_imported_attachments(context, doc)¶: Save attachments on the context as real document attachments.

tempfile_for_upload(upload)¶: Uploaded files might not be on disk, ensure it is by creating a temporary file.

class indigo.analysis.publications.base.BasePublicationFinder¶

This finds publication details for a published document. For example, a country-specific implementation can lookup a Government Gazette given a date, gazette name, and number.

find_publications(params)¶: Return a list of publications matching the given params, a dict of arbitrary key-value pairs.

locale = (None, None, None)¶: The locale this finder is suited for, as (country, language, locality).

class indigo.analysis.refs.base.BaseRefsFinder¶

Finds references to Acts in documents.

find_references_in_document(document)¶: Find references in +document+, which is an Indigo Document object.

make_href(match)¶: Turn this match into a full FRBR URI href. Check for an existing Act with that FRBR URI in the locality first; default to national (may or may not exist).

marker_tag = 'ref'¶: Tag that will be used to markup matches.

markup_match(node, match)¶: Markup the match with a ref tag. The first group in the match is substituted with the ref.

class indigo.analysis.terms.base.BaseTermsFinder¶

Finds references to defined terms in documents.

Subclasses must implement find_terms_in_document.

add_terms_to_references(doc, terms)¶: Add defined terms to the references section of the XML.

build_tlc_term(parent, id, term)¶: Build an element such as <TLCTerm eId=”term-applicant” href=”/ontology/term/this.eng.applicant” showAs=”Applicant”/>

definition_sections(doc)¶: Yield sections (or other basic units) that potentially contain definitions of terms.

find_definitions(doc)¶: Find def elements in the document and return a dict from term ids to the text of the term.

find_term_references(doc, terms)¶: Find and decorate references to terms in the document. The +terms+ param is a dict from term_id to actual term.

find_terms_in_document(document)¶: Find defined terms in +document+, which is an Indigo Document object.

guess_at_definitions(doc)¶

Find defined terms in the document, such as:

“this word” means something…

It identifies “this word” as a defined term and wraps it in a def tag with a refersTo attribute referencing the term being defined. The surrounding block structure is also has its refersTo attribute set to the term. This way, the term is both marked as defined, and the container element with the full definition of the term is identified.

mark_definition(container, term, start_pos, end_pos)¶: Update the container node to wrap the given term in a definition tag.

renumber_terms(doc)¶: Recalculate eIds for <term> elements

class indigo.analysis.toc.base.TOCBuilderBase¶

This builds a Table of Contents for an Act.

A Table of Contents is a tree of TOCElement instances, each element representing an item of interest in the Table of Contents. Each item has attributes useful for presenting a Table of Contents, such as a type (chapter, part, etc.), a number, a heading and further child elements.

The TOC is assembled from certain tags in the document, see toc_elements.

The Table of Contents can also be used to lookup the XML element corresponding to an item in the Table of Contents identified by its component and id.

commenceable_items(toc)¶

Return a list of those items in +toc+ that are considered commenceable.

By default, these are all the child items in the main component, except the preface, preamble and conclusion. Only the top-level toc elements are assessed.

component_elements = ['component', 'attachment']¶: Elements that are considered components.

default_title(item)¶: Generates a default title for a given item, including the type, number, and/or heading as appropriate.

determine_component(element)¶: Determine the component element which contains +element+.

friendly_title(item)¶: Build a friendly title for this, based on heading names etc.

get_component_id(name, element)¶: Get an ID for this component element.

insert_provisions(provisions, id_set, items)¶

Insert provisions from current toc at their correct indexes in provisions. provisions is a list of provisions for a work, usually built up by adding provisions to it

from each point in time (using this method).

id_set is the current set of ids that have already been added to provisions;: it helps ensure that our list contains only unique provisions.

items is a list of commenceable provisions from the current document’s ToC.

locale = (None, None, None)¶: The locale this TOC builder is suited for, as (country, language, locality).

process_elements(component, component_id, elements, parent=None)¶: Process the list of elements and their children, and return a (potentially empty) set of TOC items.

table_of_contents(act, language)¶: Get the table of contents of act as a list of TOCElement instances.

table_of_contents_entry_for_element(document, element)¶: Build the table of contents entry for an element from a document.

table_of_contents_for_document(document)¶: Build the table of contents for a document.

title_with_optional_type(item)¶: Generates a title for a given item, including the type, number, and/or heading as appropriate. Only includes the type if there’s no heading. Examples:

Section no num or heading Section 1. num but no heading The Heading heading but no num A. – The Heading num and heading

title_with_type(item)¶: Generates a title for a given item, including the number and/or heading as appropriate. Always includes the type. Examples:

Article no num or heading Chapter 1. num but no heading Part – The Heading heading but no num Article A. – The Heading num and heading

title_without_type(item)¶: Generates a title for a given item, including the number and/or heading as appropriate. Never includes the type. Examples:

no num or heading

1. num but no heading The Heading heading but no num A. – The Heading num and heading

titles = {}¶

Dict from toc elements (tag names without namespaces) to functions that take a TOCElement instance and return a string title for that element.

Include the special item default to handle elements not in the list. This will override the behaviour in default_title.

titles_with_optional_type = ['section']¶: List of elements that should only include the type in the title if there is no heading.

titles_without_type = ['subpart', 'attachment']¶: List of elements that should never include the type in the title.

toc_basic_units = ['section']¶: The basic units for the tradition.

toc_deadends = ['meta', 'attachments', 'components', 'embeddedStructure', 'quotedStructure', 'subFlow']¶: Elements we don’t check or recurse into because they contain sub-documents or subflows.

toc_elements = ['coverpage', 'preface', 'preamble', 'conclusions', 'attachment', 'component', 'alinea', 'article', 'book', 'chapter', 'clause', 'division', 'indent', 'level', 'list', 'paragraph', 'part', 'point', 'proviso', 'rule', 'section', 'subchapter', 'subclause', 'subdivision', 'sublist', 'subparagraph', 'subpart', 'subrule', 'subsection', 'subtitle', 'title', 'tome', 'transitional']¶

Elements we include in the table of contents, without their XML namespace. Base includes the following from the from the AKN schema: - all hierarchicalStructure elements, except:

meta and body are excluded

attachment and component are included individually rather than their plural containers

all ANhier (hierarchical) elements
no block elements.

class indigo.analysis.work_detail.base.BaseWorkDetail¶

Provides some locale-specific work details.

Subclasses should implement work_numbered_title.

no_numbered_title_numbers = ['constitution']¶: These numbers don’t have numbered titles.

no_numbered_title_subtypes = []¶: These subtypes don’t have numbered titles.

number_must_be_digit_doctypes = ['act']¶: These doctypes only have numbered titles if the number starts with a digit.

work_friendly_type(work)¶: Return a friendly document type for this work, such as “Act” or “By-law”.

work_numbered_title(work)¶: Return a formatted title using the number for this work, such as “Act 5 of 2009”. This usually differs from the short title. May return None.

class indigo.analysis.work_detail.base.BaseWorkDetail¶

Provides some locale-specific work details.

Subclasses should implement work_numbered_title.

no_numbered_title_numbers = ['constitution']¶: These numbers don’t have numbered titles.

no_numbered_title_subtypes = []¶: These subtypes don’t have numbered titles.

number_must_be_digit_doctypes = ['act']¶: These doctypes only have numbered titles if the number starts with a digit.

work_friendly_type(work)¶: Return a friendly document type for this work, such as “Act” or “By-law”.

work_numbered_title(work)¶: Return a formatted title using the number for this work, such as “Act 5 of 2009”. This usually differs from the short title. May return None.

class indigo.plugins.LocaleBasedRegistry¶

Base class for locale-based registries. Helps register and lookup locale-based classes.

for_document(topic, document, many=False)¶: Find an appropriate helper for this document.

for_locale(topic, country=None, language=None, locality=None, many=False)¶: Find an appropriate importer for this locale description. Tightest match wins.

for_work(topic, work, many=False)¶: Find an appropriate helper for this work.

register(topic, name=None)¶: Class decorator that registers a new class with the registry.

register_instance(topic, name, inst)¶: Registers an object with the registry.

registry = None¶: Registry of class names to classes. Subclasses MUST define this to avoid sharing registry classes.

Plugins¶

Locales¶

Plugin registry¶

Fetching a plugin¶

Custom tasks¶

Loading plugins and custom tasks¶

Plugin API reference¶

Table of Contents

Previous topic

Next topic

This Page