Stability of Tikhonov Regularization Class 07, March 2003 Alex Rakhlin

Podobne dokumenty
Helena Boguta, klasa 8W, rok szkolny 2018/2019

Revenue Maximization. Sept. 25, 2018

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering

Agnostic Learning and VC dimension

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis

SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1. Fry #65, Zeno #67. like

Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019

Hard-Margin Support Vector Machines

Convolution semigroups with linear Jacobi parameters

Previously on CSCI 4622

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2

Katowice, plan miasta: Skala 1: = City map = Stadtplan (Polish Edition)

Rachunek lambda, zima

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab

Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Aerodynamics I Compressible flow past an airfoil

Rozpoznawanie twarzy metodą PCA Michał Bereta 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów

OpenPoland.net API Documentation

Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout

tum.de/fall2018/ in2357

New Roads to Cryptopia. Amit Sahai. An NSF Frontier Center

Pielgrzymka do Ojczyzny: Przemowienia i homilie Ojca Swietego Jana Pawla II (Jan Pawel II-- pierwszy Polak na Stolicy Piotrowej) (Polish Edition)

Tychy, plan miasta: Skala 1: (Polish Edition)


ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS.

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Egzamin maturalny z języka angielskiego na poziomie dwujęzycznym Rozmowa wstępna (wyłącznie dla egzaminującego)

Jak zasada Pareto może pomóc Ci w nauce języków obcych?

Stargard Szczecinski i okolice (Polish Edition)

Roland HINNION. Introduction

18. Przydatne zwroty podczas egzaminu ustnego. 19. Mo liwe pytania egzaminatora i przyk³adowe odpowiedzi egzaminowanego

How to translate Polygons

Few-fermion thermometry

A sufficient condition of regularity for axially symmetric solutions to the Navier-Stokes equations

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction

SubVersion. Piotr Mikulski. SubVersion. P. Mikulski. Co to jest subversion? Zalety SubVersion. Wady SubVersion. Inne różnice SubVersion i CVS

ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL

Leba, Rowy, Ustka, Slowinski Park Narodowy, plany miast, mapa turystyczna =: Tourist map = Touristenkarte (Polish Edition)

MaPlan Sp. z O.O. Click here if your download doesn"t start automatically

Steeple #3: Gödel s Silver Blaze Theorem. Selmer Bringsjord Are Humans Rational? Dec RPI Troy NY USA

Gradient Coding using the Stochastic Block Model

Relaxation of the Cosmological Constant

deep learning for NLP (5 lectures)

Poland) Wydawnictwo "Gea" (Warsaw. Click here if your download doesn"t start automatically

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Zdecyduj: Czy to jest rzeczywiście prześladowanie? Czasem coś WYDAJE SIĘ złośliwe, ale wcale takie nie jest.

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2

Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku

Has the heat wave frequency or intensity changed in Poland since 1950?

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences.

Lecture 20. Fraunhofer Diffraction - Transforms

DODATKOWE ĆWICZENIA EGZAMINACYJNE

Camspot 4.4 Camspot 4.5

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Surname. Other Names. For Examiner s Use Centre Number. Candidate Number. Candidate Signature

harmonic functions and the chromatic polynomial

Wybrzeze Baltyku, mapa turystyczna 1: (Polish Edition)

POLITECHNIKA ŚLĄSKA INSTYTUT AUTOMATYKI ZAKŁAD SYSTEMÓW POMIAROWYCH

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wprowadzenie do programu RapidMiner, część 2 Michał Bereta 1. Wykorzystanie wykresu ROC do porównania modeli klasyfikatorów

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

O przecinkach i nie tylko

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

Unitary representations of SL(2, R)

Installation of EuroCert software for qualified electronic signature

OBWIESZCZENIE MINISTRA INFRASTRUKTURY. z dnia 18 kwietnia 2005 r.

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO POZIOM ROZSZERZONY MAJ 2010 CZĘŚĆ I. Czas pracy: 120 minut. Liczba punktów do uzyskania: 23 WPISUJE ZDAJĄCY

Zestawienie czasów angielskich

Counting quadrant walks via Tutte s invariant method

Instrukcja konfiguracji usługi Wirtualnej Sieci Prywatnej w systemie Mac OSX

Zmiany techniczne wprowadzone w wersji Comarch ERP Altum

PSB dla masazystow. Praca Zbiorowa. Click here if your download doesn"t start automatically

Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2)

EPS. Erasmus Policy Statement

Hakin9 Spam Kings FREEDOMTECHNOLOGYSERVICES.CO.UK

Rodzaj obliczeń. Data Nazwa klienta Ref. Napędy z pasami klinowymi normalnoprofilowymi i wąskoprofilowymi 4/16/ :53:55 PM

Piłką na platformach i przez teleporty. Andrzej P.Urbański Wydział Informatyki Politechnika Poznańska

Zarządzanie sieciami telekomunikacyjnymi

Maximum A Posteriori Chris Piech CS109, Stanford University

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Sargent Opens Sonairte Farmers' Market

Knovel Math: Jakość produktu

Twierdzenie Hilberta o nieujemnie określonych formach ternarnych stopnia 4

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

EXAMPLES OF CABRI GEOMETRE II APPLICATION IN GEOMETRIC SCIENTIFIC RESEARCH

Ankiety Nowe funkcje! Pomoc Twoje konto Wyloguj. BIODIVERSITY OF RIVERS: Survey to students

Volley English! Tak więc zapraszamy na lekcję nr 2.

Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip)

Title: On the curl of singular completely continous vector fields in Banach spaces

Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI


OSTC GLOBAL TRADING CHALLENGE MANUAL

PRESENT TENSES IN ENGLISH by

Język angielski. Poziom rozszerzony Próbna Matura z OPERONEM i Gazetą Wyborczą CZĘŚĆ I KRYTERIA OCENIANIA ODPOWIEDZI POZIOM ROZSZERZONY CZĘŚĆ I

Transkrypt:

Stability of Tikhonov Regularization 9.520 Class 07, March 2003 Alex Rakhlin

Plan Review of Stability Bounds Stability of Tikhonov Regularization Algorithms

Uniform Stability Review notation: S = {z 1,..., z l }; S i,z = {z 1,..., z i 1, z, z i+1,..., z l } c(f, z) = V (f(x), y), where z = (x, y). An algorithm A has uniform stability β if (S, z) Z l+1, i, sup u Z c(f S, u) c(f S i,z, u) β. Last class: Uniform stability of β = O ( ) 1 l implies good generalization bounds. This class: Tikhonov Regularization has uniform stability of β = O ( 1 l ). Reminder: The Tikhonov Regularization algorithm: f S = arg min f H 1 l l i=1 V (f(x i ), y i ) + λ f 2 K

Generalization Bounds Via Uniform Stability If β = k l for some k, we have the following bounds from the last lecture: P ( I[f S ] I S [f S ] kl ) ( + ɛ lɛ 2 ) 2 exp 2(k + M) 2. Equivalently, with probability 1 δ, I[f S ] I S [f S ] + k l + (2k + M) 2 ln(2/δ). l

Lipschitz Loss Functions, I We say that a loss function (over a possibly bounded domain X ) is Lipschitz with Lipschitz constant L if y 1, y 2, y Y, V (y 1, y ) V (y 2, y ) L y 1 y 2. Put differently, if we have two functions f 1 and f 2, under an L-Lipschitz loss function, sup V (f 1 (x), y) V (f 2 (x), y) L f 1 f 2. (x,y) Yet another way to write it: c(f 1, ) c(f 2, ) L f 1 ( ) f 2 ( )

Lipschitz Loss Functions, II If a loss function is L-Lipschitz, then closeness of two functions (in L norm) implies that they are close in loss. The converse is false it is possible for the difference in loss of two functions to be small, yet the functions to be far apart (in L ). Example: constant loss. The hinge loss and the ɛ-insensitive loss are both L-Lipschitz with L = 1. The square loss function is L Lipschitz if we can bound the y values and the f(x) values generated. The 0 1 loss function is not L-Lipschitz at all an arbitrarily small change in the function can change the loss by 1: f 1 = 0, f 2 = ɛ, V (f 1 (x), 0) = 0, V (f 2 (x), 0) = 1.

Lipschitz Loss Functions for Stability Assuming L-Lipschitz loss, we transformed a problem of bounding sup u Z c(f S, u) c(f S i,z, u) into a problem of bounding f S f S i,z. As the next step, we bound the above L norm by the norm in the RKHS assosiated with a kernel K. For our derivations, we need to make another assumption: there exists a κ satisfying x X, K(x, x) κ.

Relationship Between L and L K Using the reproducing property and the Cauchy-Schwartz inequality, we can derive the following: x f(x) = K(x, ), f( ) K K(x, ) K f K = = K(x, ), K(x, ) f K K(x, x) f K κ f K Since above inequality holds for all x, we have f f K. Hence, if we can bound the RKHS norm, we can bound the L norm. Note that the converse is not true. Note that we now transformed the problem to bounding f S f S i,z K.

A Key Lemma We will prove the following lemma about Tikhonov regularization: f S f S i,z 2 K L f S f S i,z λl This theorem says that when we replace a point in the training set, the change in the RKHS norm (squared) of the difference between the two functions cannot be too large compared to the change in L. We will first explore the implications of this lemma, and defer its proof until later.

Bounding β, I Using our lemma and the relation between L K and L, f S f S i,z 2 K L f S f S i,z λl Lκ f S f S i,z K λl Dividing through by f S f S i,z K, we derive f S f S i,z K κl λl.

Bounding β, II Using again the relationship between L K and L, and the L Lipschitz condition, sup V (f S ( ), ) V (f S z,i( ), ) L f S f S z,i Lκ f S f S z,i K L2 κ 2 λl = β

Divergences Suppose we have a convex, differentiable function F, and we know F (f 1 ) for some f 1. We can guess F (f 2 ) by considering a linear approximation to F at f 1 : ˆF (f 2 ) = F (f 1 ) + f 2 f 1, F (f 1 ). The Bregman divergence is the error in this linearized approximation: d F (f 2, f 1 ) = F (f 2 ) F (f 1 ) f 2 f 1, F (f 1 ).

Divergences Illustrated d F (f 2, f 1 ) (f 2, F (f 2 )) (f 1, F (f 1 ))

Divergences Cont d We will need the following key facts about divergences: d F (f 2, f 1 ) 0 If f 1 minimizes F, then the gradient is zero, and d F (f 2, f 1 ) = F (f 2 ) F (f 1 ). If F = A + B, where A and B are also convex and differentiable, then d F (f 2, f 1 ) = d A (f 2, f 1 ) + d B (f 2, f 1 ) (the derivatives add).

The Tikhonov Functionals We shall consider the Tikhonov functional T S (f) = 1 l l i=1 V (f(x i ), y i ) + λ f 2 K, as well as the component functionals and V S (f) = 1 l l i=1 V (f(x i ), y i ) N(f) = f 2 K. Hence, T S (f) = V S (f) + λn(f). If the loss function is convex (in the first variable), then all three functionals are convex.

A Picture of Tikhonov Regularization Ts(f) Ts (f) R Vs(f) Vs (f) N(f) F fs fs

Proving the Lemma, I Let f S be the minimizer of T S, and let f S i,z be the minimizer of T S i,z, the perturbed data set with (x i, y i ) replaced by a new point z = (x, y). Then λ(d N (f S i,z, f S ) + d N (f S, f S i,z)) d TS (f S i,z, f S ) + d TS i,z (f S, f S i,z) = 1 l (c(f S i,z, z i ) c(f S, z i ) + c(f S, z) c(f S i,z, z)) 2L f S f S i,z. l We conclude that d N (f S i,z, f S ) + d N (f S, f S i,z) 2L f S f S i,z λl

Proving the Lemma, II But what is d N (f S i,z, f S )? We will express our functions as the sum of orthogonal eigenfunctions in the RKHS: f S (x) = f S i,z(x) = n=1 n=1 c n φ n (x) c nφ n (x) Once we express a function in this form, we recall that f 2 K = c 2 n n=1 λ n

Proving the Lemma, III Using this notation, we reexpress the divergence in terms of the c i and c i : d N (f S i,z, f S ) = f S i,z 2 K f S 2 K f S i,z f S, f S 2 K = = = We conclude that n=1 n=1 n=1 c 2 n λ n n=1 c 2 n λ n c 2 n + c 2 n 2c nc n λ n (c n c n ) 2 λ n = f S i,z f S 2 K i=1 (c n c n )( 2c n ) λ n d N (f S i,z, f S ) + d N (f S, f S i,z) = 2 f S i,z f S 2 K

Proving the Lemma, IV Combining these results proves our Lemma: f S i,z f S 2 K = d N(f S i,z, f S ) + d N (f S, f S i,z) 2 2L f S f S i,z λl

Bounding the Loss, I We have shown that Tikhonov regularization with an L- Lipschitz loss is β-stable with β = L2 κ 2 λl. If we want to actually apply the theorems and get the generalization bound, we need to bound the loss. Let C 0 be the maximum value of the loss when we predict a value of zero. If we have bounds on X and Y, we can find C 0.

Bounding the Loss, II Noting that the all 0 function 0 is always in the RKHS, we see that Therefore, λ f S 2 K T (f S) T ( 0) = 1 l C 0. l i=1 V ( 0(x i ), y i ) f S 2 K C 0 λ = f S κ f S K κ C0 λ Since the loss is L-Lipschitz, a bound on f S boundedness of the loss function. implies

A Note on λ We have shown that Tikhonov regularization is uniformly stable with β = L2 κ 2 λl. If we keep λ fixed( as we ) increase l, the generalization bound will tighten as O 1. However, keeping λ fixed is equiva- l lent to keeping our hypothesis space fixed. As we get more data, we want λ to get smaller. If λ gets smaller too fast, the bounds become trivial.

Tikhonov vs. Ivanov It is worth noting that Ivanov regularization ˆf H,S = arg min f H s.t. 1 l f 2 K τ l i=1 V (f(x i ), y i ) is not uniformly stable with β = O ( 1 n ), essentially because the constraint bounding the RKHS norm may not be tight. This is an important distinction between Tikhonov and Ivanov regularization.