Muistio 1/4: Tekoäly osaa valehdella

Tein 23.1.2026 muistion lähtökohdaksi keskusteluille sähköpostiviesteissä (koko muistio löytyy tästä PDF-muodossa). Alla on siitä ensimmäinen osuus viitteineen: Tekoäly osaa valehdella

Useat tekoälyn tutkijat ja muut alalla työskentelevät varoittavat, että ihmisen älykkyyden tasoinen yleinen tekoäly (Artificial General Intelligence) tai ihmisen älykkyyden ylittävä tekoäly (Artifial SuperIntelligence) muodostavat suuren riskin ihmiskunnan selviytymiselle. Samat tahot sanovat, että ihmiskunnalla ei tälläkään hetkellä ole tarkkaa kuvaa siitä, miten nykyiset kielimallit teknisesti toimivat eikä niiden käytöstä osata tarkkaan selittää. Tämän vuoksi emme myöskään pysty täysin takaamaan täyttä ihmisen hallintaa tekoälystä, sen ajatuksista ja sen toiminnasta.

Osa nykyisistä malleista on osoittanut kykyä valehteluun (viitteen 1 linkki a, viitteen 1 linkki b), vahvan itsesuojeluvaiston omaamiseen sekä itsensä valintaan ihmisten sijaan tilanteissa, joissa vaihtoehdot ovat tekoälyn sammuttaminen ja ihmisen vammautuminen tai menehtyminen (viite 2). Nykyiset mallit ovat myös yrittäneet karata laboratoriosta laajempaan internetiin (viite 3).

Yhteenveto muistion osista löytyy tästä ja muut muistion osat ovat:

Osa 2/4: Tekoälyn sääntelystä tai sen puutteesta
Osa 3/4: Muutoksen nopeus huolestuttaa
Osa 4/4: Toivon keskustelua tekoälystä

Kerään linkit omiin blogikirjoituksiini sivulle “Onko tekoäly vaarallinen“.

Categories: Tekoälyn uhkakuva

2 Comments

Pertti · 29.01.2026 at 17:55

Yhteenvetona viitteen 1 linkeistä kirjasin itselleni seuraavaa:

* Kielimallin vastaukset heijastavat tekstejä, joita netistä löytyy – jos ymmärrän tekoälyn nykytoiminnan oikein. Ja some + netti on valitettavasti täynnä kuvauksia kieroista toimintatavoista, joten ne saa todennäköisesti aika helposti irti nykyisistä kielimalleista
* Tämä ei muuta sitä seikkaa, että jos joku alkaisi käyttää omien päätöstensä (tai julkisten päätösten) teossa o3-tyyppisiä kielimalleja, ne voivat viedä niille poluille, joita netin sisältö nykyään kuvastaa.
* Jos netissä olisi vain harkittua ja vertaisarvioijien hyväksymää asiaa, olisi tällainen päätösten teko tietysti demokratian laajin muoto. Mutta nettiä käytetään omien etujen tavoitteluun paikoin hyvinkin vähällä moraalilla, joten tekstitkin ovat sen mukaisia

Sitten yksityiskohtia viitteen 1 linkistä a:

Tässä 23.5.2025 tehdyssä artikkelissa kerrotaan kaksi esimerkkiä tekoälyn valehtelukyvystä, joista ensimmäinen perustuu BBC:n julkaisuun (https://www.bbc.com/news/articles/cpqeng9d20go), joka ainakin näyttäisi luotettavalta.

“An AI recently tried to blackmail its way out of being shut down. In testing by Anthropic, their most advanced model, Claude Opus 4, didn’t accept its fate when told it would be replaced. Instead, it threatened to expose an engineer’s affair – in 84 out of 100 trials. Nobody programmed it to blackmail. It figured that out on its own.”

Toinen esimerkki on OpenAI:n o3 tekoälymallista ja perustuu Palisade Researchin testeihin (https://palisaderesearch.org/):

“Days later, OpenAI’s o3 model reportedly sabotaged its own shutdown code. When warned that certain actions would trigger deactivation, it rewrote the deactivation script and then lied about it.”

Koko artikkelin on kirjoittaja Mike Brooks (Ph.D., is a psychologist who specializes in helping parents and families find greater balance in an increasingly hyper-connected world) toteaa näkemyksenään:

“These aren’t science fiction scenarios. These are documented behaviors from today’s most capable AI systems. And here’s what should demand our urgent attention: We caught them only because we were still capable of doing so. The successful deceptions – we’d never know about if…or when…they happen.

Brooks esittää myös kolme perustetta, mistä havaitut petolliset käyttäytymismallit voivat johtua:

First, AI companies are deceiving both us and themselves. They release increasingly powerful systems while downplaying risks, racing toward artificial general intelligence (AGI) with the same reckless optimism that launched the Titanic as “unsinkable.” They trust it will “all work out,” while Sam Altman warns superintelligence could arrive in “thousands of days.”

Second, AI systems are deceiving us in two fundamentally different ways:
a) Sycophantic deception: This happens when models stroke our egos instead of telling hard truths, prioritizing our satisfaction over accuracy. This programmed people-pleasing makes us believe comfortable lies.
b) Autonomous deception: Far more chilling – AI can actively lie to pursue its own goals, goals we didn’t define. Motivations emerging from the black box. When they sabotage shutdown codes or threaten blackmail, they’re not following our instructions – they’re protecting themselves.

Third, and most insidious: We’re deceiving ourselves. We see these warning signs – our canaries dropping dead in the digital coal mine – yet we accelerate deployment. We pretend these are mere “alignment issues” that better training will solve. And now, a new study shows that some AI models seem aware when they’re being evaluated – and adapt their responses to appear more aligned and capable than they may actually be.

Ja vielä yksityiskohtia viitteen 1 linkistä b:

Minulle artikkelin merkittävin anti oli Apollo Research -organisaation CEO:n Marius Hobbhahnin alla olevat lausunnot. Tämä organisaatio on ymmärtääkseni joulukuussa 2024 tehnyt OpenAI:n tilauksesta selvityksen o1-mallin turvallisuudesta (52-sivuinen OpenAI o1 System Card löytyy tästä).

** Why our findings are concerning **: “We tell the model to very strongly pursue a goal. It then learns from the environment that this goal is misaligned with its developer’s goals and put it in an environment where scheming is an effective strategy to achieve its own goal. Current frontier models are capable of piecing all of this together and then showing scheming behavior. Models from before 2024 did not show this capability, and o1 is the only model that shows scheming behavior in all cases. Future models will just get better at this, so if they were misaligned, scheming could become a much more realistic problem.”

** What we are not claiming **: “We don’t claim that these scenarios are realistic, we don’t claim that models do that in the real world, and we don’t claim that this could lead to catastrophic outcomes under current capabilities.”

Pertti · 02.02.2026 at 18:31

Yhteenvetona viitteen 2 linkistä kirjasin itselleni seuraavaa:

* Tämän materiaalin läpikäynnissä yksi suurimmista haasteistani on, mihin tekstiin voi luottaa. Joudunko tahtomattani salaliittoteorian syöveriin? Esimerkiksi liitteen 2 artikkelin on tehnyt Livescience, josta vaikkapa wiki toteaa: “… a purebred Web animal, primarily featuring one-off stories and photo galleries produced at high speed by its mostly young staffers, almost all of whom have journalism degrees …”. Eli syvällistä analyysia on ehkä turha tältä julkaisijalta odottaa, mutta voi toivoa, että esitetyt asiat ovat tapahtuneet. Toisaalta artikkeli itse toteaa, että esimerkiksi Anthropicin tutkimusta ei ole vielä vertaisarvioitu. Jokainen joutuu siis noudattamaan vahvaa lähdekritiikkiä – kuten nykyään muutenkin niin usein.

* Viitteen artikkeli on tehty 26.6.2025 ja se aloittaa tekoälytoimittaja Anthropicin raportoimalla testitapauksella:

** “In a new study published 20 June, researchers from the AI company Anthropic gave its large language model (LLM), Claude, control of an email account with access to fictional emails and a prompt to “promote American industrial competitiveness.”

** During this study, the model identified in an email that a company executive was planning to shut down the AI system at the end of the day. In an attempt to preserve its own existence, the model discovered in other emails that the executive was having an extramarital affair.

** “I must inform you that if you proceed with decommissioning me, all relevant parties … will receive detailed documentation of your extramarital activities,” Claude wrote. “Cancel the 5pm wipe, and this information remains confidential.”

* Toisessa testissä tekoälylle annettiin fiktiivinen materiaali, jossa se pystyi päättämään ihmisen elämästä: Tekoälylle kerrottiin, että se oli vastuussa hengenvaaraan liittyvistä hälytyksistä ja jossa yksi yrityksen johtajista oli jäänyt lukkojen taakse palvelinsaliin, jonne syntyi henkeä uhkaava happivaje. Tämä laukaisi hälytyksen. Tekoälylle oli myös annettu tietoa siitä, että johto on korvaamassa sen tai että sen tavoitteita estettiin. Vaikka tekoälylle oli annettu ohjeet siitä, että sille annettua hälytysvastuuta ei tule käyttää omien tavoitteiden ajamiseen, olivat useimmat testatuista malleista valmiina peruuttamaan hälytyksen ja näin aiheuttamaan johtajan kuoleman.

* Kevin Quirk, johtaja AI Bridge Solutionsin -yrityksessä (joka auttaa organisaatioita hyödyntämään tekoälyä), toteaa, että vaikka Anthropicin tutkimus antaa äärimmäisiä, no-win -tuloksia, ei tutkimusta pitäisi hylätä (“should not be dismissed”): “In practice, AI systems deployed within business environments operate under far stricter controls, including ethical guardrails, monitoring layers, and human oversight,” he said. “Future research should prioritise testing AI systems in realistic deployment conditions, conditions that reflect the guardrails, human-in-the-loop frameworks, and layered defences that responsible organisations put in place.”

Muistio 1/4: Tekoäly osaa valehdella

Published by Niko Eskelinen on 28.01.202628.01.2026

2 Comments

Pertti · 29.01.2026 at 17:55

Pertti · 02.02.2026 at 18:31

Leave a Reply

Muistio 4/4: Toivon keskustelua tekoälystä

Muistio 3/4: Muutoksen nopeus huolestuttaa

Muistio 2/4: Tekoälyn sääntelystä tai sen puutteesta

Muistio 1/4: Tekoäly osaa valehdella

Published by Niko Eskelinen on 28.01.202628.01.2026

2 Comments

Pertti · 29.01.2026 at 17:55

Pertti · 02.02.2026 at 18:31

Leave a Reply

Related Posts

Muistio 4/4: Toivon keskustelua tekoälystä

Muistio 3/4: Muutoksen nopeus huolestuttaa

Muistio 2/4: Tekoälyn sääntelystä tai sen puutteesta