XML and Content Management

XML and Content Management Lecture 14: Case study: Legal documents in Sejm Władysław Baksza, Maciej Ogrodniczuk Sejm, 17 January 2011 Lecture 14: Case study: Legal documents in Sejm XML and Content Management 1

A little history Before 2005: a system for storing metadata of legal acts (current versions, proposals, amendments), texts stored in the filesystem as MS Word files, manual versioning, manual unification of texts. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 2

Splitting the process Two (distinct) areas of document management: 1 a server module for managing the legislative processes: registering new texts, controlling status of work, managing variants of amendments, triggering edit/view/merge,... 2 editing environment (XMetaL): information stored to a large extent in texts: publication addresses, dates: when the bill has been passed, when it goes into effect etc. amendment links, definitions, XMetaL extensions (CSS/macros/keyboard shortcuts), repository integration: WebDAV. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 3

What does Polish law look like? Lecture 14: Case study: Legal documents in Sejm XML and Content Management 4

What does the amendment look like? Lecture 14: Case study: Legal documents in Sejm XML and Content Management 5

XMetaL interface Lecture 14: Case study: Legal documents in Sejm XML and Content Management 6

Document schema General assumptions: no formatting constructs, representation of the legal structure: hiearchy of units: sections, chapters, articles, paragraphs, points,... text content of all parts, footnotes, additional elements: comments, definitions, statements of Constitutional Tribunal, external elements: editorial islands, amendment tags. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 7

What does it look like? <ustawa data-uchwalenia="20 grudnia 1990"> <tytul>o jednostkach innowacyjno-wdrożeniowych</tytul> <adres-publikacji> <adres-dzu rok="1991" numer="2" pozycja="7"/> </adres-publikacji> <artykul nr="1">urząd Postępu Naukowo-Technicznego i Wdrożeń na dotychczasowych zasadach: <punkt nr="1">dokonuje skreśleń w rejestrze jednostek innowacyjno-wdrożeniowych w okresie trzech lat od dnia wejścia w życie niniejszej ustawy,</punkt> <punkt nr="2">wpisuje jednostki do rejestru.</punkt> </artykul> <artykul nr="2">ustawa wchodzi w życie po upływie 14 dni od dnia ogłoszenia.</artykul> </ustawa> Lecture 14: Case study: Legal documents in Sejm XML and Content Management 8

Results In 2009: documents created in XML (according to so-called Sejm schema ), multiformat presentation (PDF, HTML, DOCX) via XSLT, fine-grained versioning (at every save), texts stored in relational database (Oracle, CLOB), a mechanism for automated merging of amendments over the current text (with amendment links, tags, paths), Word2XML converter (semi-automated), software architecture: Solaris (Windows)/Oracle/JBoss/ XMetaL. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 9

CSS example ustawa:before { content: "USTAWA\Az dnia " attr(data-uchwalenia) " r."; font-weight: bold; display: block; text-align: center; } Lecture 14: Case study: Legal documents in Sejm XML and Content Management 10

XSLT example <xsl:template match="artykul"> <xsl:choose> <xsl:when test="count(ancestor::*[name() = dodaj or name() = zastap ]) > 0"> <div class="artykul-cytowany"> <xsl:if test="position() = 1"> </xsl:if> Art. <xsl:value-of select = "@nr"/> <xsl:apply-templates/> </div> </xsl:when> <xsl:otherwise> <div class="artykul"> <div class="naglowek-artykulu">art. <xsl:value-of select = "@nr"/></div> <xsl:apply-templates/> </div> </xsl:otherwise> </xsl:choose> </xsl:template> Lecture 14: Case study: Legal documents in Sejm XML and Content Management 11

Merging amendments Bills can be merged with the current texts by means of: storing amendment links between pairs of texts in the amended contents: OO: links between the new and current text, CC: amendment tags representing individual changes in the text with their legally binding dates, link targets stored as XPath expressions, possibility of merging many bills over a single current text, signalling potential conflicts. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 12

Amendment links Amendment links store information about the relation between the fragment of the bill and the fragment of the base text. They are inserted directly where the title of the base text is referred to (replacing it). The current information about the text is retrieved from the database every time the text is opened. An example: <przywolanie ustawa-id="8ab282971830bb8a01183221c82c0010" typ="nowelizacyjne"> <ustawa-info data-uchwalenia="6 grudnia 1996"> <tytul>o zastawie rejestrowym i rejestrze zastawów</tytul> <adres-publikacji> <adres-dzu rok="1996" numer="149" pozycja="703"/> </adres-publikacji> </ustawa-info> <przypis nr="2"/> </przywolanie> Lecture 14: Case study: Legal documents in Sejm XML and Content Management 13

Amendment tags Two dimensions: 1 the type of a change: <dodaj> (add), <zmien> (change), <usun> (delete), 2 change target: structure element, text fragment. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 14

Examples of amendment types Text: W art. 7 w ust. 1 dodaje się pkt 6a w brzmieniu: 6a) analizę wykorzystania wynikających z kontroli wniosków dotyczących stanowienia lub stosowania prawa.. XML representation: W art. 7 w ust. 1 dodaje się pkt 6a w brzmieniu: <dodaj> <po-elemencie id="//artykul[@nr= 7 ] /ustep[@nr= 1 ]/punkt[@nr= 6 ]"/> <element> <punkt nr="6a">analizę wykorzystania wynikających z kontroli wniosków dotyczących stanowienia lub stosowania prawa.</punkt> </element> </dodaj> Lecture 14: Case study: Legal documents in Sejm XML and Content Management 15

Examples of amendment types Text: W art. 2 ust. 3 wyrazy armie zastępuje się wyrazami siły zbrojne. XML representation: W art. 2 ust. 3 wyrazy <zastap> <w-elemencie id="//artykul[@nr= 2 ] /ustep[@nr= 3 ]"/> <tekst>armie</tekst> <akcja>zastępuje się wyrazami</akcja> <tekstem>siły zbrojne</tekstem> </zastap>. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 16

The model of a change tag (<zastap>) <xsd:element name="zastap"> <xsd:complextype> <xsd:choice> <xsd:sequence> <xsd:choice maxoccurs="unbounded"> <xsd:element ref="element"/> <xsd:element ref="elementy"/> </xsd:choice> <xsd:element ref="elementem" minoccurs="0"/> </xsd:sequence> <xsd:sequence> <xsd:element ref="w-elemencie" minoccurs="0" maxoccurs="unbounded"/> <xsd:element ref="tekst"/> <xsd:element ref="akcja" minoccurs="0"/> <xsd:element ref="tekstem"/> </xsd:sequence> </xsd:choice> <xsd:attributegroup ref="atr-nowelizacyjne"/> </xsd:complextype> </xsd:element> Lecture 14: Case study: Legal documents in Sejm XML and Content Management 17

Content model variants 5 content models for representing: 1 a change of a single structural element: <element id="ścieżka "> an empty element with a path pointing at the element which must be changed, <elementem> new element content (stored inside the tag), 2 a change of a continuous sequence of structural elements: <elementy od="ścieżka1 " do="ścieżka2 "> with paths pointing at the beginning and end of the set of elements being changed, <elementem> new content of the set, 3 a change of a text fragment: <w-elemencie id="ścieżka "> with the path to the element containing the text to be changed, <tekst> old text, <akcja> binding text (e.g. zastępuje się wyrazy ), <tekstem> new text, Lecture 14: Case study: Legal documents in Sejm XML and Content Management 18

Content model variants 4 encasing the text in a structural element: <element id="ścieżka "> pointing at element which has text to be encased, <elementem poziom="element "> naming the element representing required level of structure (e.g. ustep), 5 converting a structural element to the base text: <element id="ścieżka "> poiting at element which has content to be converted. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 19

Amednment result Lecture 14: Case study: Legal documents in Sejm XML and Content Management 20

Word2XML conversion Majority of the texts are being created in MS Word, so they must be converted into XML: Word files saved as DOCXs, conversion implemented in Java, additional text properties (such as punctuation apart from the structure) verified by the conversion process, conversion errors saved in the result document as Word comments, result document created on the fly, regular expressions modified outside the converter. Lecture 14: Case study: Legal documents in Sejm XML and Content Management 21

Converter interface Lecture 14: Case study: Legal documents in Sejm XML and Content Management 22

Future plans What is not there, but could be (and is easy to implement with the current model): presentation of a version of each text valid for a given date, full-text search (and all other types of search), representation of committee works,??? Lecture 14: Case study: Legal documents in Sejm XML and Content Management 23