Wst p do informatyki. Elementy teorii jezyków formalnych. Piotr Fulma«ski. January 3, 2019

Podobne dokumenty
Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019

Helena Boguta, klasa 8W, rok szkolny 2018/2019

Tychy, plan miasta: Skala 1: (Polish Edition)

deep learning for NLP (5 lectures)

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis

SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1. Fry #65, Zeno #67. like

Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition)

EXAMPLES OF CABRI GEOMETRE II APPLICATION IN GEOMETRIC SCIENTIFIC RESEARCH

Katowice, plan miasta: Skala 1: = City map = Stadtplan (Polish Edition)

Hard-Margin Support Vector Machines


Steeple #3: Gödel s Silver Blaze Theorem. Selmer Bringsjord Are Humans Rational? Dec RPI Troy NY USA

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction

ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL


Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Zarządzanie sieciami telekomunikacyjnymi

Stargard Szczecinski i okolice (Polish Edition)

Egzamin maturalny z języka angielskiego na poziomie dwujęzycznym Rozmowa wstępna (wyłącznie dla egzaminującego)

Revenue Maximization. Sept. 25, 2018

Convolution semigroups with linear Jacobi parameters

OpenPoland.net API Documentation

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Rachunek lambda, zima

Installation of EuroCert software for qualified electronic signature

DOI: / /32/37

Introduction to Computer Science

Marzec: food, advertising, shopping and services, verb patterns, adjectives and prepositions, complaints - writing

SubVersion. Piotr Mikulski. SubVersion. P. Mikulski. Co to jest subversion? Zalety SubVersion. Wady SubVersion. Inne różnice SubVersion i CVS

First-order logic. Usage. Tautologies, using rst-order logic, relations to natural language

MaPlan Sp. z O.O. Click here if your download doesn"t start automatically

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

XML. 6.6 XPath. XPath is a syntax used for selecting parts of an XML document

DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE!

Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2)

ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS.

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

OBWIESZCZENIE MINISTRA INFRASTRUKTURY. z dnia 18 kwietnia 2005 r.

Jak zasada Pareto może pomóc Ci w nauce języków obcych?

Previously on CSCI 4622

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab

SNP SNP Business Partner Data Checker. Prezentacja produktu

Rozpoznawanie twarzy metodą PCA Michał Bereta 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

HAPPY ANIMALS L01 HAPPY ANIMALS L03 HAPPY ANIMALS L05 HAPPY ANIMALS L07

HAPPY ANIMALS L02 HAPPY ANIMALS L04 HAPPY ANIMALS L06 HAPPY ANIMALS L08



DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE!

SNP Business Partner Data Checker. Prezentacja produktu

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO POZIOM ROZSZERZONY MAJ 2010 CZĘŚĆ I. Czas pracy: 120 minut. Liczba punktów do uzyskania: 23 WPISUJE ZDAJĄCY

POLITYKA PRYWATNOŚCI / PRIVACY POLICY

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2

General Certificate of Education Ordinary Level ADDITIONAL MATHEMATICS 4037/12

DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE!

HAPPY K04 INSTRUKCJA MONTAŻU ASSEMBLY INSTRUCTIONS DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE! W5 W6 G1 T2 U1 U2 TZ1

Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip)

UMOWY WYPOŻYCZENIA KOMENTARZ

European Crime Prevention Award (ECPA) Annex I - new version 2014

Fig 5 Spectrograms of the original signal (top) extracted shaft-related GAD components (middle) and

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Wyk lad 8: Leniwe metody klasyfikacji

JĘZYK ANGIELSKI ĆWICZENIA ORAZ REPETYTORIUM GRAMATYCZNE

Extraclass. Football Men. Season 2009/10 - Autumn round

Network Services for Spatial Data in European Geo-Portals and their Compliance with ISO and OGC Standards

Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI

ABOUT NEW EASTERN EUROPE BESTmQUARTERLYmJOURNAL

Surname. Other Names. For Examiner s Use Centre Number. Candidate Number. Candidate Signature

Bazy danych Ćwiczenia z SQL


Wykaz linii kolejowych, które są wyposażone w urządzenia systemu ETCS

LEARNING AGREEMENT FOR STUDIES

Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO

Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku

OSI Network Layer. Network Fundamentals Chapter 5. ITE PC v4.0 Chapter Cisco Systems, Inc. All rights reserved.

Wykaz linii kolejowych, które są wyposażone w urzadzenia systemu ETCS


Instrukcja obsługi User s manual


PLSH1 (JUN14PLSH101) General Certificate of Education Advanced Subsidiary Examination June Reading and Writing TOTAL

18. Przydatne zwroty podczas egzaminu ustnego. 19. Mo liwe pytania egzaminatora i przyk³adowe odpowiedzi egzaminowanego

Polska Szkoła Weekendowa, Arklow, Co. Wicklow KWESTIONRIUSZ OSOBOWY DZIECKA CHILD RECORD FORM

TECHNICAL CATALOGUE WHITEHEART MALLEABLE CAST IRON FITTINGS EE

Wroclaw, plan nowy: Nowe ulice, 1:22500, sygnalizacja swietlna, wysokosc wiaduktow : Debica = City plan (Polish Edition)

kdpw_stream Struktura komunikatu: Status komunikatu z danymi uzupełniającymi na potrzeby ARM (auth.ste ) Data utworzenia: r.

DUAL SIMILARITY OF VOLTAGE TO CURRENT AND CURRENT TO VOLTAGE TRANSFER FUNCTION OF HYBRID ACTIVE TWO- PORTS WITH CONVERSION

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences.

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO POZIOM ROZSZERZONY MAJ 2010 CZĘŚĆ I. Czas pracy: 120 minut. Liczba punktów do uzyskania: 23 WPISUJE ZDAJĄCY

Presented by. Dr. Morten Middelfart, CTO

Znak forma podstawowa 4 budowa i proporcje 5 pole ochronne 6 kolorystyka 7 warianty monochromatyczne 8 tła znaku 9 niedozwolone stosowanie znaku 11

SPIS TREŚCI / INDEX OGRÓD GARDEN WYPOSAŻENIE DOMU HOUSEHOLD PRZECHOWYWANIE WINA WINE STORAGE SKRZYNKI BOXES

Transkrypt:

Wst p do informatyki Elementy teorii jezyków formalnych Piotr Fulma«ski Wydziaª Matematyki i Informatyki, Uniwersytet Šódzki, Polska January 3, 2019

Table of contents

Formal language Basic concepts In mathematics, computer science, and linguistics, Denition a formal language is a set of strings of symbols that may be constrained by rules that are specic to it. In short The alphabet of a formal language is the set of symbols, letters, or tokens from which the strings of the language may be formed. The strings formed from this alphabet are called words. The words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often dened by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule.

Formal language Basic concepts In mathematics, computer science, and linguistics, Denition a formal language is a set of strings of symbols that may be constrained by rules that are specic to it. In short The alphabet of a formal language is the set of symbols, letters, or tokens from which the strings of the language may be formed. The strings formed from this alphabet are called words. The words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often dened by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule.

Formal language Basic concepts In mathematics, computer science, and linguistics, Denition a formal language is a set of strings of symbols that may be constrained by rules that are specic to it. In short The alphabet of a formal language is the set of symbols, letters, or tokens from which the strings of the language may be formed. The strings formed from this alphabet are called words. The words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often dened by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule.

Formal language Basic concepts In mathematics, computer science, and linguistics, Denition a formal language is a set of strings of symbols that may be constrained by rules that are specic to it. In short The alphabet of a formal language is the set of symbols, letters, or tokens from which the strings of the language may be formed. The strings formed from this alphabet are called words. The words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often dened by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule.

Formal language Why we use them? The eld of formal language theory studies primarily the purely syntactical aspects of such languagesthat is, their internal structural patterns. Formal language theory sprang out of linguistics, as a way of understanding the syntactic regularities of natural languages. In computer science, formal languages are used among others as the basis for dening the grammar of programming languages and formalized versions of subsets of natural languages in which the words of the language represent concepts that are associated with particular meanings or semantics. In computational complexity theory, decision problems are typically dened as formal languages, and complexity classes are dened as the sets of the formal languages that can be parsed by machines with limited computational power. In logic and the foundations of mathematics, formal languages are used to represent the syntax of axiomatic systems, and mathematical formalism is the philosophy that all of mathematics can be reduced to the syntactic manipulation of formal languages in this way.

Formal language Why we use them? An alphabet, in the context of formal languages, can be any set, although it often makes sense to use an alphabet in the usual sense of the word, or more generally a character set such as ASCII or Unicode. The elements of an alphabet are called its letters.

Formal language Why we use them? A word over an alphabet can be any (nite) sequence, or string, of characters or letters, which sometimes may include spaces, and are separated by specied word separation characters. The set of all words over an alphabet Σ is usually denoted by Σ (using the Kleene star). The length of a word is the number of characters or letters it is composed of. For any alphabet there is only one word of length 0, the empty word, which is often denoted by e, ε or λ. By concatenation one can combine two words to form a new word, whose length is the sum of the lengths of the original words. The result of concatenating a word with the empty word is the original word.

Formal language More details about alphabet, words and language A formal language L over an alphabet Σ is a subset of Σ, that is, a set of words over that alphabet. Sometimes the sets of words are grouped into expressions, whereas rules and constraints may be formulated for the creation of 'well-formed expressions'. Formal language theory rarely concerns itself with particular languages, but is mainly concerned with the study of various types of formalisms to describe languages. For instance, a language can be given as those strings generated by some formal grammar; those strings described or matched by a particular regular expression; those strings accepted by some automaton, such as a Turing machine or nite state automaton.

Formal language Example 1: alphabet and the set of all words over it Je±li Σ = {a, b}, to Σ = {e, a, b, aa, ab, bb, aaa, aab,... }.

Formal language Example 2: simple language with unformal rules The following rules describe a formal language L over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =} Every nonempty string that does not contain + or = and does not start with 0 is in L. The string 0 is in L. A string containing = is in L if and only if there is exactly one =, and it separates two valid strings of L. A string containing + but not = is in L if and only if every + in the string separates two valid strings of L. No string is in L other than those implied by the previous rules.

Formal language Example 2: simple language with unformal rules Under these rules, the string "23+4=555" is in L, but the string "=234=+" is not.

Formal language Example 2: simple language with unformal rules This formal language expresses how some strings look like (their syntax), not what they mean (semantics). For instance, nowhere in these rules is there any indication that "0" means the number zero, or that "+" means addition.

Formal grammar Basic concepts Denition In formal language theory, a grammar (formal grammar) is a set of production rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them. A formal grammar is a set of rules for rewriting strings, along with a "start symbol" from which rewriting starts. Therefore, a grammar is usually thought of as a language generator. However, it can also sometimes be used as the basis for a "recognizer" a function in computing that determines whether a given string belongs to the language or is grammatically incorrect.

Formal grammar More details In the classic formalization of generative grammars rst proposed by Noam Chomsky in the 1950s, a grammar G formally dened as the tuple where (N, Σ, P, S) N is a nite set of nonterminal symbols, that is disjoint with the strings formed from G. Σ is a nite set of terminal symbols that is disjoint from N. P is a nite set of production rules, each rule of the form (Σ N) N(Σ N) (Σ N). That is, each production rule maps from one string of symbols to another, where the rst string (the "head") contains an arbitrary number of symbols provided at least one of them is a nonterminal. S N is a distinguished symbol that is the start symbol.

Formal grammar Example 1 A language L in which well-formed expressions are of the form {1, 11, 111,... } could be dened as N = S Σ = {1} P consists of the following production rules: rule 1: S --> 1 rule 2: S --> S1 S = S Now it can be prooved thet 1 and 111 belongs to the language L Proof 1: S --(rule 1)--> 1 Proof 2: S --(rule 2)--> S1 --(rule 2)--> S11 --(rule 1)--> 111

Formal grammar Example 2 Consider the grammar G where N = {S, B} Σ = {a, b, c} P consists of the following production rules: rule 1: S --> absc rule 2: S --> abc rule 3: Ba --> ab rule 4: Bb --> bb S = S This grammar denes the language L(G) = {a n b n c n n 1} where a n denotes a string of n consecutive a's. Thus, the language is the set of strings that consist of 1 or more a's, followed by the same number of b's, followed by the same number of c's. Some examples of the derivation of strings in L(G) are: S --(2)--> abc S --(1)--> absc --(2)--> ababcc --(3)--> aabbcc --(4)--> aabbcc

BackusNaur Form Notation Basic concepts In computer science, BNF (Backus Normal Form or BackusNaur Form) is a notation techniques for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. A BNF specication is a set of derivation rules, written as <symbol> ::= expression where <symbol> is a nonterminal symbol, and the expression expression consists of one or more sequences of symbols; symbols that never appear on a left side are terminals; symbols that appear on a left side are non-terminals and are always enclosed between the pair <>; symboll ::= means that the symbol on the left must be replaced with the expression on the right.

BackusNaur Form Notation Example 1 Consider BNF rules for some context-free grammar <binary number> ::=<binary number><digit> <binary number> ::=<digit> <digit> ::= 0 <digit> ::= 1 This is equivalent to the grammar G where N = {<binary number>, <digit>} Σ = {0, 1} P consists of the following production rules: rule 1: <binary number> --> <binary number><digit> rule 2: <binary number> --> <digit> rule 3: <digit> --> 0 rule 4: <digit> --> 1 S = <binary number>

BackusNaur Form Notation Example 2 Consider BNF rules for some context-free grammar (natural numbers grammar) <zero> ::= 0 <nonzero digit> ::= 1 2 3 4 5 6 7 8 9 <digit> ::= <zero> <nonzero digit> <sequence of digits> ::= <digit> <digit><sequence of digits> <natural number> ::= <digit> <nonzero digit><sequence of digits>

BackusNaur Form Notation Why extend it? The main problem with BNF (beeing far from human readable) is that repetitions and optional parts can not be expressed directly. Instead, we have indirect rule and recursive way of dening repetitions and options. Repetition BNF <number> ::= <digit> <number> ::= <number> <digit> Extended BNF <number> ::= { digit }+ Option BNF <signed number> ::= <sign> <number> <signed number> ::= <number> EBNF <signed number> ::= [ <sign> ] <number>

BackusNaur Form Notation Variants and extensions of BNF Many BNF specications found online today are intended to be human readable and are non-formal. These often include many of the following syntax rules and extensions: Optional items enclosed in square brackets: [<item>]. Items repeating 0 or more times are enclosed in curly brackets or suxed with an asterisk ('*'), such as <word> ::= <letter> {<letter>} <word> ::= <letter> <letter>* Items repeating 1 or more times are suxed with an addition (plus) symbol (+). Terminals may appear in bold rather than italics, and nonterminals in plain text rather than angle brackets. Alternative choices in a production are separated by the vertical bar,, indicating a choice. Where items are grouped, they are enclosed in simple parentheses.

BackusNaur Form Notation Which Extended BNF is the right one? The earliest EBNF was originally developed by Niklaus Wirth incorporating some of the concepts (with a dierent syntax and notation) from Wirth syntax notation. However, as we have seen, many variants of EBNF are in use. The International Organization for Standardization has adopted an EBNF standard (ISO/IEC 14977). Other EBNF variants use somewhat dierent syntactic conventions. The most important thing about (E)BNF is not to follow precise and strictly formal rules about it but to keep in mind why we use it and how to make it human readable and easy to use.

Regular expressions Basic concepts A regular expression, often called a pattern, is an expression used to specify a set of strings required for a particular purpose. A simple way to specify a set of strings is simply to list its elements or members. However, there are often more concise ways to specify the desired set of strings. For example, the set containing the three strings Handel, Händel, and Haendel can be specied by the pattern H(ä ae?)ndel; we say that this pattern matches each of the three strings.

Regular expressions Basic concepts Most formalisms provide the following operations to construct regular expressions. Boolean "or". A vertical bar separates alternatives. For example, gray grey can match gray or grey. Grouping. Parentheses are used to dene the scope and precedence of the operators (among other uses). For example, gray grey and gr(a e)y are equivalent patterns which both describe the set of gray or grey.

Regular expressions Basic concepts Quantication. A quantier after a token (such as a character) or group species how often that preceding element is allowed to occur. The most common quantiers are the question mark?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene cross). The question mark indicates there is zero or one of the preceding element. For example, colou?r matches both color and colour. The asterisk indicates there is zero or more of the preceding element. For example, ab*c matches ac, abc, abbc, abbbc, and so on. The plus sign indicates there is one or more of the preceding element. For example, ab+c matches abc, abbc, abbbc, and so on, but not ac. These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +,, etc. For example, H(ae? ä)ndel and H(a ae ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä ae?)ndel. Note!!! The precise syntax for regular expressions varies among tools and with context.

Regular expressions Regular exppressions may be useful

Regular expression Examples Visit http://regexpal.com/ to verify.at matches any three-character string ending with "at", including "hat", "cat", and "bat". [hc]at matches "hat" and "cat". [^b]at matches all strings matched by.at except "bat". [^hc]at matches all strings matched by.at other than "hat" and "cat". ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line. [hc]at$ matches "hat" and "cat", but only at the end of the string or line. \[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".