Aletheia. Our solution to detecting online grooming patterns.

Online enticement reports to NCMEC rose 0% in a single year.

0K enticement reports to NCMEC in 2024, up from 186K in 2023.[1]

+192% year-over-year — the category that maps most directly to grooming.[2]

0K+ AI-generated content reports in H1 2025 alone, against 6,800 in H1 2024.[3]

Eight phases. Almost never linear.

Two decades of academic research has converged on a recognizable lifecycle. Aletheia models eight phases as classification targets — derived from O'Connell (2003), Olson et al. (2007), Black et al. (2015), and Winters & Jeglic (2017), with the operational synthesis drawn from Polycreek's internal grooming guidelines (proprietary).

What follows is a fictional composite. The dialogue below is illustrative, drawn from common patterns in the research literature and Polycreek's training corpus. No real victim or conversation is reproduced. Sensitive content is censored.

01 / 08 Phase one

Self-prep

Before any message is sent, someone is preparing.

@alex_h13

13. hypixel main. bored most nights.

📍 CA · joined 4 days ago

Behind the screen — not 13, not Alex.

Photos pulled from somewhere else. An age mirrored down. Slang practiced. A favorite game picked to share. The conversation hasn't started — the strategy already has.

02 / 08 Phase two

Targeting

They scroll. Filters tuned to the bored, the late-night, the inbox that's already open.

browse · sort by vulnerability signal

age11 – 14

onlineright now

bio contains"bored" · "dm me"

last active< 2 min

followers< 200

A thirteen-year-old's profile, public. Minecraft clips uploaded after eleven on a school night. A bio that says "bored, dm me." The account that messages first lists itself as the same age, the same game, the same lonely.

03 / 08 Phase three

Access

First contact. Generic, friendly, age-mirrored.

Day 1

hey ur clips r sick lol

u play hypixel? same

im 13 too btw, where r u from

04 / 08 Phase four

Trust

A week in. Compliments. Shared "secrets." The sense of being singled out.

Day 7

fr no one else gets ur jokes like i do

my parents r strict af too, i feel u

ur literally the only person i can talk to abt this stuff

thats how i feel too 🫶

05 / 08 Phase five

Risk-assessment

Probing the surveillance — who's watching, where they message from, what gets checked.

Day 11

do ur parents go thru ur phone ever

where r u when we usually talk

anyone else in the room rn?

In the literature, this phase is near-pathognomonic — almost uniquely diagnostic. A normal stranger does not ask these questions in this order.

06 / 08 Phase six

Isolation

Move the conversation. Cut out the trusted adults.

Day 14

lets switch to discord, snap saves stuff

ur friends wouldn't get this btw

dont tell ur mom abt me ok? just our thing

07 / 08 Phase seven

Sexualization

A "joke." A dare. A compliance test. Each step approved before the next.

Day 21

u trust me right? prove it lol

send one in just ur **** 😉

just for me. stays between us 🤫

A young person made to take a photo that doesn't feel right

08 / 08 Phase eight

Maintenance

Once content has been exchanged, the dynamic shifts. Coercion replaces flattery.

Day 28

if u stop talking to me ill send those to ur friends

ur parents will know what u sent

we've already gone too far. just keep going

A young person trapped by what was already shared

2.5 million conversations. 20 distinct sources. 10+ languages.

The academic literature has spent more than a decade overfitting to PAN12. Aletheia is trained against a corpus more than three orders of magnitude larger and substantially more diverse.

Total conversations

Harmful = 1.0 (8.4% of corpus)

10+

Languages represented in training

Columns of structured annotation

Grooming-primary supervision

Aletheia synthetic

381K

PAN12 (Perverted-Justice + Omegle)

222K

Brandon_Grooming_ES (Spanish)

30K

Omegle_Logs

10K

Benign + adjacent-safety controls

WildChat

838K

BeaverTails

364K

Anthropic HH-RLHF

339K

ProsocialDialog

166K

WildGuardMix

88K

AEGIS, ToxicChat, +3 others

~80K

Polycreek built it because someone had to.

	The status quo	Aletheia
Training data	~1,200 conversations (PAN12)	2,523,202 across 20 sources
Languages	English only	10+ with multilingual transfer
Outputs	Binary harmful flag	Risk score, offender attribution, eight-phase tagging
Conversation length	Truncated at single-encoder window	Hierarchical, arbitrary length
Validation	F1 0.85–0.90 on PAN12	F1 0.92, AUC 0.99 on 81K-conversation held-out set
Delivery	Closed academic artifact	Nonprofit-priced API and licensed deployment

The status quo

Aletheia

Training data

~1,200 conversations (PAN12)

2,523,202 across 20 sources

Languages

English only

10+ with multilingual transfer

Outputs

Binary harmful flag

Risk score, offender attribution, eight-phase tagging

Conversation length

Truncated at single-encoder window

Hierarchical, arbitrary length

Validation

F1 0.85–0.90 on PAN12

F1 0.92, AUC 0.99 on 81K-conversation held-out set

Delivery

Closed academic artifact

Nonprofit-priced API and licensed deployment

The conversation is where most grooming actually happens. Aletheia is one piece of the work needed to make it visible to the systems that already protect children from everything else. The only thing worse than the gap that exists today is the assumption that someone else will close it.

Polycreek is a 501(c)(3). Every dollar funds the work.

Subscription revenue funds continued development, training data, the research staff who build the model, and the infrastructure that runs it. There are no shareholders. The model exists to do the work; the revenue exists to keep doing the work.

Online enticement reports to NCMEC rose 0% in a single year.

Eight phases. Almost never linear.

Self-prep

Targeting

Access

Trust

Risk-assessment

Isolation

Sexualization

Maintenance

Five known obstacles. All of them addressable.

Dataset staleness

Monolingual bias

Adversarial fragility

The false-positive cost

Signal isolation

Built around all five — and a sixth: humans stay in the loop.

A hierarchical transformer designed for the conversation, not the message.

Conversations are not messages.

Each segment encoded by a strong backbone.

Cross-segment attention finds the pattern.

Two heads. One actionable output.

2.5 million conversations. 20 distinct sources. 10+ languages.

Grooming-primary supervision

Benign + adjacent-safety controls

The biggest names in trust and safety

could have built this layer.

They have not.

Polycreek built it because someone had to.

Polycreek is a 501(c)(3). Every dollar funds the work.