SUIA INORMAIA 2009 Volue 30 Nuber 2A 83 Weroa IĄKOWSKA, Jerzy MARYNA Jagelloa Uversty, Isttute of outer Scece A HYBRI LASSIIER BASE ON SVM MEHO OR ANER LASSIIAION Suary. I ths aer, we roosed a ew ethod of alyg Suort Vector Maches SVMs for cacer classfcato. We roosed a hybrd classfer that cosders the degree of a ebersh fucto of each class wth the hel of uzzy Nave Bayes NB ad the orgazes oe-versus-rest OVR SVMs as the archtecture classfyg to the corresodg class. I ths ethod, we used a ovel syste of orderg the recogzed eresso rofles by eas of usg NB ad geerg SVMs wth the OVR schee. he results show that our hybrd classfer s coarable to the covetoal ethods. Keywords: SVM ethod, uzzy Nave Bayes, cacer classfcato HYBRYOWY KLASYIKAOR OARY NA MEOZIE SVM LA KLASYIKAJI HORÓB ONKOLOGIZNYH Streszczee. W artyule zarooowao ową etodę lasyfac chorób oologczych. Użyto w e.. awego, rozytego lasyfatora bayesowsego ag. uzzy Nave Bayes oraz aszyy z wetora wseraący ag. Suort Vector Maches ao systeu lasyfuącego. a owstały hybrydowy lasyfator lasyfue choroby oologcze orówywale z owecoaly etoda. Słowa luczowe: etoda SVM, awy rozyty bayes, lasyfaca chorób oologczych 1. Itroducto Suort Vector Maches SVMs are adatve learg systes whch receve labeled trag data ad trasfor these robles to otzato robles [12]. SVMs are
300 W. ątowsa, J. Martya usually solved by fdg solutos to quadratc rograg robles. Orgally the SVMs were used for bary atter classfcato robles where data were learly searable, but the algorth has bee eteded to hadle data that are ot searable by troducg slac varables [3] ad to use olear decso regos va erel fuctos [9]. herefore, a soluto to the SVMs worg wth sutable erel fuctos ca be foud by solvg the quadratc rograg roble the dual observato sace rather tha the ral feature sace, thereby reducg overall coutatos. NA croarrays cota forato about the gee eresso varatos of cells dfferet tssues [1]. he croarrays allow to uderstad the actvtes of gees uderlyg dfferet cacers. hus, the obtaed forato ca tur be used to detfy tyes or subtyes of cacers Mcroarrays allows to uderstad the actvtes of gees uderlyg dfferet cacers. hus, the obtaed forato ca tur be used to detyfy tyes or subtyes of cacers. Are use curretly two tyes of NA croarrays: the sotted cna [4] develoed at Staford Uversty ad dgoucleotde chs [6] develoed by Affyetr. Sotted croarrays are ade of a sold surface oto whch scule aouts sots of sgle strads of ucleotde sequeces are laced whch are deosted by a autoated rocess called cotact sottg a grd-le arrageet. Each sot defes a secfc gee ad serves as a robe agast whch a sale RNA s hybrdzed. Wth dgoucletde chs the robes are sythetzed o the array o the bass of the sequeces of estg or hyothetcal gees usg hotolthograhc techology. Affyetr also uses ultle robes to rereset the gees. I ost coutatoal eerets wth croarrays the raw data develoed fro these arrays ust be coutatoally collected, rocessed, ad tegrated. hs rocess of data rearato s called re-rocessg. It allows for coesatg systeatc easureet errors due to array equet erfecto ad also for obtag a sgle eresso level for each gee. As a result, the data fro dfferet croarrays are tegrated to a sgle data atr. Each row of ths atr of gee eresso corresods to a dfferet gee. Each colu corresods to a dfferet sale of te stat of whch the eresso data were easureed. I ths aer, we roose a ew odfed SVM ethod for cacer classfcato. he uzzy Naïve Bayes ethod descrbed by Rado ad Lawry [11] ad used atter recogto ad data aalyss reles o the use of soe dstace fucto. I the roosed ethod, the selecto stage by the Bayesa lelhood ftess fucto are added to covetoal SVM ethod. he raader of ths aer s orgazed as follows. I secto 2, we gve basc cocets of cacer classfcato wth the use of the SVMs ethod. I secto 3, we overvew the NB
A hybrd classfer based o SVM ethod for cacer classfcato 301 ethod that was roosed to resolve uclassfable regos ultclass robles. I secto 4, we gve several eerets results to show the valdty of our roosed ethod. ally secto 5 gves the coclusos. 2. Basc cocets of cacer classfcato usg SVMs I ths secto we gve basc cocets of cacer classfcato wth the use of the SVMs ethod. Wth the hel of the croarray techologes a large volue of gee eresso rofles s roduced. Mcroarray techques lead to a colete uderstadg of the olecular varatos aog dseases. hese gee eressos rovde forato about lless cludg soe tyes of cacers. Several data g ethods have bee develoed whch volve classfcato of gee eressos [8]. he gee eressos allow for obtag soe forato whch s useful for the classfer buldg. he rrelevat or redudat data ca decrease the accuracy of classfcato. herefore, a classfer whch s suffcetly resstat to accuracy ust be rovded. he SVMs ethod reresets oe of the ost ortat classfers. We recall that the SVM as a ut sale o a hgh-desoal sace ad zes the uber of sclassfed obects the trag set ad azes the arg betwee the boudg laes. N or trag set {, } wth the ut data y 1 the class label y { 1,1 }, the SVM calculates the lear classfer R ad the outut data R y wth y sg[ w + b] 1 Whe the data of the two classes are searable we have the orgal SVM classfer [12], [13], [14] that satsfes the followg codtos. w φ + b + 1 f y 1 w φ + b 1 f y 1 hese two sets of equaltes ca be cobed to oe sgle set as follows: 2 where y [ w φ + b] 1 0, 1,2,... N 3, φ : R R s the feature ag the ut sace to a usually hgh desoal feature sace. he data ots are learly searable by a hyerlae defed by the ar w R, b R. hus, the classfcato fucto s gve by f sg{ w φ + b} 4
302 W. ątowsa, J. Martya Istead of estatg wth the hel of the feature a we wor wth a erel fucto the orgal sace gve by K, y φ φ y 5 We troduce slac varable ξ such that y [ w φ + b] 1 ξ, ξ > 0, 1,2,..., N 6 he followg zato roble s accouted for as follows: subect to w, b, ξ J w, b, ξ 1 2 w 2 N + ξ 1 y[ w φ + b] 1 ξ, ξ > 0, 1,2,... N, > 0 8 where s a ostve costat araeter used to cotrol the tradeoff betwee the trag error ad the arg. he dual roble of the syste 8, obtaed as a result of Karush-Kuh-ucer KK codto, leads to a well-ow cove quadratc rograg Q. 7 3. A hybrd classfer based o SVMs for cacer classfcato I ths secto, we reset our hybrd classfer for cacer classfcato whch s based o SVMs ad uzzy Nave Bayes NB. he overvew of our hybrd classfer s gve g. 1. uzzy Nave Bayes NB are used to estate the robablty for classes rob,,..., }, whle SVMs classfy { 1 2 sales by usg the orgal trag data set of gee eresso rofles. he roosed SVMs allows for a robablstc orderg of cacer classes whch, further, s used by our NB after ts estato. he uzzy Nave Bayes are geerally based o the Bayesa theore. We assue that a focal set for each attrbute s gve. Let attrbute be uerc wth uverse Ω, the the lelhood of gve ca be rereseted by a desty fucto detered fro the gee eresso rofles ad a ror desty accordg to Jeffrey s rule [5], aely ro Bayes theore, we ca obta 9
A hybrd classfer based o SVM ethod for cacer classfcato 303 10 where Ω d 11 Substtutg Eq. 10 Eq. 11 ad re-arragg gves: f 12 where ca be derved fro accordg to 13 g. 1. Structure of hybrd classfer for cacer classfcato Rys. 1. Strutura hybrydowego lasyfatora dla lasyfac chorób oologczych hs odel, called uzzy Nave Bayes NB, ca rovde soe easures. he robablty of each class ca be calculated wth the use of Bayes theore [7], aely: ar ar ar er er er 14 where er er ar ar ad er ar s a feature of the arer gee.
304 W. ątowsa, J. Martya o rove the classfcato erforace we used a earso correlato as easure of the slarty betwee a deal arer ad gee g. he earso correlato [2] s used here as follows: ear 1 deal g 1deal 1g / 2 2 2 2 1deal 1deal / 1 g 1 g / 15 where s the uber of gees the croarray data set ad deal s a -th gee the croarray selected as the deal arer. able 1 ofuso atr acer tye 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1. Breast 65 35 2. rostate 86 14 3. Lug 100 4. olorectal 100 5. Lyhoa 10 90 6. Bladder 20 80 7. Melaoa 78 22 8. Uterus_adeo 100 9. Leuea 10 90 10. Real 67 11. acreas 33 33 34 12. Ovary 25 25 50 13. Mesotheloa 100 14. NS 100 We assued that a gee s a foratve gee f the dstace gve by the earso correlato ear s sall, whle the gee s ot a foratve gee f the dstace s large. 4. A eale of aalyss o evaluate our roosed ethod, we used the GM data set ublshed by Raaaway et al. 2001 [10]. It cossts of 144 trag sales ad 54 testg sales of 14 cacer classes. Each sale ossesses 16063 gee eresso levels. he etoed GM data set s avalable at: ht://www.geoe.w.t.edu/mr/gm. Eght etastatc sales fro the testg sales were droed, therefore the used testg sales cossted of 46 testg sales ad 14 cacer classes. Accordg to our ethod we selected 140 gees for learg NB based o the earso correlato. We used the lear erel fucto of SVMs. he features of sales are oralzed fro 0 to 1.
A hybrd classfer based o SVM ethod for cacer classfcato 305 he obtaed cofuso atr for the gve 14 cacer classes s gve able 1. As the codg strategy we used the wer-taes-all ethod. able 2 he accurracy of used ethods Method Accurracy % OVR-SVM 72 NB 68 Hybrd classfer 80 Our rogras are wrtte the MALAB laguage. Addtoally, we used the software acage for the SVM algorth whch s avalable at htt://www.erel-aches.org. I able 2 we coare the accurracy of used ethods. SVMs wth the oe-versus-rest strategy gave 72% classfcato accuracy. he NB acheved 68%. he hybrd ethod of the OVR-SVM ad the NB roduced the accuracy equal to 80%. It has bee show that our ethod has classfed better tha the OVR-SVM ad the NB treated searetely. 5. oclusos he hybrd classfer based o SVMs to ultclass croarray classfcato has bee vestgated for cacer recogto. he roosed ethod tegrates SVMs ad the NB leared wth the hel of the OVR schee. o verfy our ethod we have aled the GM cacer dataset. o reduce the desoalty of the codg atr we have used the earso correlato. he suggested ethod has a coarable erforace to other ethods but has a better erforace tha the ethod worg dvdually. It has bee show that further roveet of the erforace of the outut rocess deeds o the outut-codg strateges. herefore, we wll fd the algorth to rove the accuracy of the ultclass classfcato esecally whe the class sze s sall. Soe algorths le the heurstc algorth could be cosdered. BIBLIOGRAHY 1. Brow. O., Brotste.: Elorg the New World of the Geoe wth NA Mcroarrays. Nat. Geet. Sul., 21, 1999,. 33 37. 2. ho S. -B., Ryu J.: lassfyg Gee Eresso ata of acer Usg lassfer Eseble wth Mutually Eclusve eatures. roc. IEE 90 11, 2002,. 1744 1753. 3. ortes., Va V. N.: Suort Vector Networs. Mache Learg, 20, 1995,. 273 297.
306 W. ątowsa, J. Martya 4. ugga. J., Btter M., he Y., Melter., ret J.: Eresso roflg Usg cna Mcroarrays. Nature Geetcs, 21, 1999,. 10 14. 5. Jeffrey R..: he Logc of ecso. Gordo ad Brech Ic., New Yor 1965. 6. Lschutz R. J., odor S.. A., Ggeras. R., Lochart. J.: Hgh esty Sythetc Egeuclectde Arrays. Nature Geetcs, 21, 1999,. 20 24. 7. Lu J., et al.: A Iroved Nave Bayesa lassfer echque ouled wth a Novel Iut Soluto Method. IEEE ras. o Systes, Ma, ad yberetcs art : Al. Rev. 31, No. 2, 2001,. 249 256. 8. McLachla G. J., o K. -A., Abrose h.: Aalyzg Mcroarray Gee Eresso ata. Joh Wley ad Sos, 2004. 9. Müller K. R., Me S., Rätsch G., suda K., Schölof B.: A Itroducto to Kerel- Based Learg Algorths. IEEE ras. O Neural Networs, Vol. 12, No. 2, 2001,. 181 201. 10. Raasway S., et al.: Multclass acer agoss Usg uor Gee Eresso Sgatures. roc. Nat. Acad. Sc., Vol. 98, No. 26, 2001,. 15149 15154. 11. Rado J., Lawry J.: lassfcato ad Query Evaluato Usg Modelg wth Words. Iforato Sceces. Secal Issue outg wth Words: Models ad Alcatos, Vol. 176, 2006,. 438 464. 12. Va V. N.: he Nature of Statstcal Learg heory. Srger-Verlag, Berl, Hedelberg, New Yor 1995. 13. Va V. N.: Statstcal Learg heory. Joh Wley ad Sos, 1998. 14. Va V. N.: he Suort Vector Method of ucto Estato. : J. A. K. Suyes, J. Vadewolle eds.. Nolear Modelg: Advaced Blac-bo echques, Kluwer Acadec ublshers, Bosto 1998,. 55 85. Recezet: rof. dr hab. ż. Adrze olańs Włyęło do Redac 5 arca 2009 r. Oówee Mroszereg NA ozwalaą a aalzę wystęowaa oogeów. rzy użycu secale sostruowaego hybrydowego lasyfatora zbadao wystęowae chorób oologczych. o budowy tego lasyfatora użyto etody wetorów oderaących ag. Suort Vector Maches oraz awy, rozyty lasyfator bayesows ag. uzzy
A hybrd classfer based o SVM ethod for cacer classfcato 307 Nave Bayes. Metodę SVM użyto w ostac archtetury tyu ede rzecw reszce ag. oe-versus-rest, co uożlwa oddzelą lasyfacę ażde lasy odoszące sę do choroby oologcze. Wyazao, że ta oracoway hybrydowy lasyfator osada lesze ożlwośc lasyfac ż obece stosowae owecoale etody. Addresses Weroa IĄKOWSKA: Uwersytet Jagellońs, Istytut Iforaty Stosowae, ul. Reyota 4, 30-059 Kraów, olsa. Jerzy MARYNA: Uwersytet Jagellońs, Istytut Iforaty, ul. Łoasewcza 4, 30-348 Kraów, olsa, artya@softlab..u.edu.l.