Personalization at Gnomi

Strategy memo

Personalization at Gnomi.

Gnomi already knows its users. The product just doesn't read what it knows.

April 2026

The gap, in one chat

Five parts of his life all point at one chat question. Gnomi answered as if it knew none of them.

The user

Karolis B.

23, Lithuanian. Junior Customer Service Specialist at Swedbank.
Pro / Annual. Lithuania 2-year promo, paid through 2027-03-26. Auto-renewal off.
33 days in. 73-hour onboarding burst on days 1 to 4. Silent since.
7-stock watchlist. APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH.
Reddit and X connected with consent. 235 Reddit rows (63 subreddits), 35 X posts already in production.

"do i get it right that if i invest into spdr spyl acc etf, instead of paying me dividends, it will automatically re-invest that money into more shares of spyl etf?"

2026-03-27 chat. A precise question about accumulating ETFs from a 23-year-old building a long-term position.

Reddit

7 posts in r/dividends, r/dividendgrowth, r/dividendinvesting, r/investing, r/trading212, r/TSLA

Occupation

Works at Swedbank, the largest bank in the Baltics

Declared

"Long-term investing" set to 100%

Watchlist

310 SOFI shares at $15 avg, a position the answer could ground in

Likes on X

Trading 212 marketing about in-app investment Q&A

Today's answer: a beginner-level explainer of "what is an ETF." No reference to his positions, his work, his prior research, or the four other parts of his life that already point at the answer.

The data is already there

The signal is in production. The build is mostly wiring it together.

What exists today	What's missing
125 DynamoDB tables of user signals (identity, chat, reactions, social)	A single unified user profile that combines them
Postgres recommender, reactions store, share-virality store	A nightly job that turns raw events into a per-user "story"
Azure AI Search with 51M+ articles indexed and queryable in <300ms	Passing user filters (country, language) into the existing search calls. A parameter, not a build.
Gnomi's existing AI chat surface	A ~500-token "who this user is" injected into every chat call

No new vector store. No model retraining. No platform rebuild. The data is collected but not yet read at the places users see: chat, push, newsletter, feed.

One row per user

Who he is (changes over years), plus what he's doing now (changes over weeks). Read by every surface.

A single Postgres row, refreshed continuously, read by chat, push, newsletter, and feed. Live context (the chat he's typing right now, the article he just clicked) layers on top at request time.

Field	Karolis's value
Stable identity (half-life: 12 to 24 months)
cognitive_archetype	Active-positions retail investor (0.95). Banking professional (0.90). Self-anchoring decision-maker (0.85).
watchlist_tickers	APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH (resolved through Quartr lookup, so cross-listings don't mis-resolve)
occupation, country	Junior Customer Service Specialist at Swedbank. Lithuania.
Active intent (half-life: 14 to 30 days)
top_topics	Long-term investing (0.97), Dead by Daylight (0.95), Iran geopolitics (0.85), Lithuanian politics (0.80), side-hustle (0.70). Confidence weighted by how many independent signals fire on the same topic, not by raw volume.
per_topic_language_affinity	Finance: English. Lithuanian politics: Lithuanian. Iran: bilingual with Lithuanian framing. Gaming: English.
lifecycle_stage	cooling. 73-hour burst on days 1 to 4, silent 30 days since. First push opportunity tied to SOFI's next earnings event.
upcoming_events	SOFI Q1, UNH Q1, NVDA Q1. Joinable today via quartr_event (200K rows) and finance_quartr_id_tracking.
chat_topic_vector	3072-dimension fingerprint of his last 50 chat questions. Fed into existing Azure AI Search calls.

For the ~80% of users with thin signal, signal_density_score gates the read: cohort defaults seed the active-intent fields, the user's own data takes over as it accumulates. Cold-start mechanic in appendix.

The Reddit and X advantage

What we learn about Karolis from his Reddit and X, that his app behavior alone never told us.

What we know	Why we know it
Active retail dividend-investor with current positions	7 Reddit posts in dividend-investor subreddits. 7-stock watchlist. Voice-chat with explicit "310 SOFI shares at $15 avg." Likes on X for Trading 212 in-app AI chatbot, SoFi USD settlement, mobile-investing privacy.
Aspiring Dead by Daylight content creator	105 of 235 Reddit posts in r/deadbydaylight (32 of them his original posts as BoyDilly). 5 original X posts (4 tagged #dbd) with embedded video links. Bio names "content creator" explicitly.
Side-hustle and monetization research	r/forhire, r/NewTubers, r/Fiverr, r/FiverrGigs, r/ShadowBan. Actively researching how to monetize the Dead by Daylight content.
Self-anchoring decision-maker	Same SoFi sell question in 2 sessions across 2 days (Mar 26 voice in English, Mar 29 finance-mode in Lithuanian). Plus the day-1 ask: "if i give you information about myself, will you remember that?"

The advantage. Without the social signal, Karolis is a 7-stock watchlist plus a Lithuanian email. Few consumer news apps build personalization from a user-authorized external profile at signup. Gnomi has the consented connection, the ingest pipeline (already running), and 19,800 X rows plus 4,400 Reddit rows in the database today.

Where this lands first: the chat prompt

About 500 tokens prepended to every chat call. Refreshed nightly. Every line traceable to a specific field.

The user is a 23-year-old Lithuanian working as a Junior Customer Service Specialist at Swedbank, the largest bank in the Baltics. He is a Pro / Annual subscriber on the Lithuania 2-year promo, paid through 2027-03-26. Auto-renewal is already off: he has decided not to extend even though he has 11 months remaining. He joined 33 days ago, used Gnomi heavily for 73 hours across days 1 to 4, then went silent. Lifecycle: active but cooling.

COGNITIVE HABITS (use to disambiguate, prioritize, ground; DON'T reference explicitly):
. Active retail investor with real positions: 7 stocks (APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH) tilted toward AI infrastructure, fintech, healthcare. Knows specific position sizes (310 SoFi shares at $15 avg). Treat finance questions as research from someone who already holds the names.
. Banking professional learning his trade: pitch price-to-earnings ratios, ETF mechanics, Fed policy at junior-bank-employee level. Precise mechanism, not glossary.
. Bilingual context-switcher: chats in English for finance and global geopolitics; in Lithuanian for Seimas politics, regional news. Match his current language.
. Self-anchoring decision-maker: asks the same question across multiple sessions when working through a decision (the SoFi sell question recurs in 2 sessions, 2 days, 2 languages). Treat as iteration, not ignorance.

LANGUAGE PER TOPIC (override user-global Lithuanian when relevant):
. Finance and portfolio: English.
. Iran geopolitics: bilingual with Lithuanian framing of consequences.
. Lithuanian local politics: Lithuanian.
. Gaming (Dead by Daylight): English.

WATCHLIST (real positions in some):
. SOFI: 310 shares at $15 avg = $4,650 cost basis, has asked twice whether to sell.
. APLD, BBAI, NVDA, NFLX, SURG, UNH (asked for 10-year analyst predictions on UNH: base, bear, bull).
Prioritize his holdings for examples. Surface earnings briefings the day before any held name.

DON'T:
. Reference any of the above facts explicitly. Let it show in disambiguation, not acknowledgment.
. Re-explain ETF basics, dividend mechanics, or price-to-earnings ratios as if for a beginner.
. Refuse public-record finance questions with "I'm not allowed to give financial advice." He is asking for research input, not a fiduciary recommendation.
. Treat finance and gaming as the same person's "interests list." They are separate domains he context-switches between.

No magic. Just plumbing.

Every line of that prompt is computable from the profile. The schema holds the truth; the prompt is just a rendered view of it.

What the prompt says	Schema field	Underlying signal
"Active retail investor with real positions"	cognitive_archetype.primary	Watchlist + voice-chat "310 SoFi shares at $15 avg" + 7 Reddit dividend posts + likes on X for Trading 212
"Banking professional learning his trade"	cognitive_archetype.secondary	Declared occupation + chat questions pitched the way a banking junior would phrase them
"Bilingual context-switcher"	per_topic_language_affinity	Chat language detection per topic: English on finance, Lithuanian on Seimas, mixed on Iran
"Long-term retail investing: 0.97"	top_topics	5 independent signals firing on the same topic: declared + watchlist + Reddit + chats + likes on X
"He holds 310 SOFI shares at $15 avg"	watchlist_tickers.notes	Verbatim voice-chat extraction
"Surface earnings briefings the day before"	upcoming_events	Watchlist joined to quartr_event (200K rows) via finance_quartr_id_tracking

Each line is a claim, grounded in real signal, with a confidence score and a known source. Computable nightly. Auditable per field. Editable by Karolis if he ever opens a profile-control screen.

What Karolis would feel

Three moments. Same user. Different product.

"is it worth selling my 310 SoFi shares?"

Today

Generic SoFi-the-company intro. No acknowledgment of his position size or prior conversation history.

After this ships

Knows he holds 310 shares at $15 avg, asked the same question 3 days ago in a different language, has liked Trading 212 posts on X. Leads with current price, recent analyst targets, the case for vs. against trimming a flat position, what to watch ahead of next earnings. No refusal.

"How does SPDR SPYL accumulating-ETF mechanics work?"

Today

Beginner-level explainer of "what is an ETF."

After this ships

Junior-bank-employee level: precise accumulating vs. distributing tax mechanics, what SPYL specifically does, how it differs from VUAA (the Vanguard equivalent he might be comparing) and other Lithuania- and EU-listed accumulators.

He returns after 30 days of silence.

Today

Same generic Lithuania-tier push as thousands of other Lithuanian users.

After this ships

Push the day before SOFI Q1 earnings, anchored on his watchlist. Reactivation flag fires. Chat prompt regenerates with his stable identity intact.

The bottom line

He asked us to remember him on day 1. We didn't. He turned off auto-renewal. The data to answer "yes" was already in production.

What it takes to read Karolis: a single profile table, a nightly job, and one place where the profile gets injected into the chat prompt. Every surface that consumes the profile (chat, push, newsletter, feed) gets sharper for free.

The profile table is the foundation. The chat prompt is the first place users feel it.

What this approach cannot do: discover topics Karolis would care about that he has never engaged with, or tell Gnomi what new product to build. It can only help Gnomi be sharper at the products it already ships, for the users it already has.

Appendix.

Appendix A1 . Schema, the shape

npa.user_personalization_profile. One row per user. Read by every surface.

Layer A . Stable identity

user_id
country
app_language
article_language
age
gender
occupation
bio
pro_tier
pro_industry
pro_job_title
pro_categories
pro_publishers
pro_schedule
acquisition_cohort
utm_subcohort
profile_photo_uploaded
watchlist_tickers
cognitive_archetype
reddit_connected
reddit_subreddits_subscribed
reddit_topical_interests
x_connected
x_following_handles
x_following_topical_clusters
cluster_id

Layer B . Active intent

top_topics
top_publishers_by_topic
top_categories
per_topic_language_affinity
reaction_summary
cross_source_compare_events
chat_topic_vector
chat_volume_30d
chat_question_examples
chat_use_mode
chat_languages
chat_grounding_rate
chat_features_used
chat_peak_hours_local
cross_modal_topics
aspirational_topics
revealed_only_topics
social_only_topics
upcoming_events
reading_dwell_summary
topic_dimension_preferences
un_likes
slow_toggle_signals
abandoned_topics
push_categories_engaged
push_categories_ignored
push_optimal_send_times
lifecycle_stage
feature_first_use
feature_last_use

Compressed view + privacy

prompt_text_summary
signal_density_score
cluster_blend_weight
data_freshness
data_provenance
user_overridden_fields
personalization_opt_out
personalization_opt_out_surfaces
social_consent_at
social_consent_scope

Three consumption paths

A. Inline injection. prompt_text_summary into chat system prompts (~500 tokens).
B. Vector retrieval. chat_topic_vector queried in Azure AI Search alongside the user's question.
C. Cohort fallback. When signal_density_score < 0.4, profile reads cluster-inherited values.

Appendix A2 . Schema, the SQL

The actual CREATE TABLE the engineering team would build.

CREATE TABLE npa.user_personalization_profile ( -- IDENTITY user_id uuid PRIMARY KEY, profile_version text NOT NULL, refreshed_at timestamptz NOT NULL, -- LAYER A: STABLE IDENTITY country text, app_language text, article_language text, pro_tier text, pro_payment_cadence text, purchase_platform text, device_platform text, pro_industry text, pro_job_title text, pro_categories text[], pro_sub_topics jsonb, pro_publishers jsonb, pro_schedule jsonb, gender text, age int, occupation text, bio text, profile_photo_uploaded bool, acquisition_cohort text, utm_subcohort text, watchlist_tickers jsonb, -- resolved through Quartr cognitive_archetype jsonb, -- {primary, secondary, tertiary, axis_scores, evidence_count, confidence} reddit_connected bool DEFAULT false, reddit_subreddits_subscribed jsonb, reddit_topical_interests jsonb, reddit_engagement_archetype text, x_connected bool DEFAULT false, x_following_handles text[], x_following_topical_clusters jsonb, cluster_id text, -- ~21 cohorts cover 97% of paying base -- LAYER B: ACTIVE INTENT top_topics jsonb, -- [{topic, confidence, source, languages, evidence_count, last_event}] top_publishers_by_topic jsonb, top_categories jsonb, per_topic_language_affinity jsonb, reaction_summary jsonb, cross_source_compare_events jsonb, chat_topic_vector vector(3072), -- pgvector, text-embedding-3-large chat_volume_30d int, chat_question_examples jsonb, chat_use_mode text, chat_languages text[], chat_grounding_rate real, chat_features_used text[], chat_peak_hours_local int[], cross_modal_topics jsonb, aspirational_topics jsonb, -- declared but not revealed revealed_only_topics jsonb, -- behavioral but not declared social_only_topics jsonb, -- in social mirror, not in chat / reactions upcoming_events jsonb, -- watchlist joined to quartr_event + chat-derived events reading_dwell_summary jsonb, -- per-topic dwell, completion, repeat visits topic_dimension_preferences jsonb, -- per-topic 3-level category mix, format mix, sentiment tilt un_likes jsonb, slow_toggle_signals jsonb, abandoned_topics jsonb, push_categories_engaged jsonb, push_categories_ignored jsonb, push_optimal_send_times int[], lifecycle_stage text, lifecycle_stage_changed_at timestamptz, lifecycle_stage_history jsonb, feature_first_use jsonb, feature_last_use jsonb, -- COMPRESSED REPRESENTATION prompt_text_summary text, -- pre-rendered ~500-token natural-language summary prompt_text_summary_built_at timestamptz, -- METADATA + GATING signal_density_score real, cluster_blend_weight real, -- 0..1 mix of cluster vs self at read time cluster_profile_version text, data_freshness jsonb, data_provenance jsonb, user_overridden_fields jsonb, -- PRIVACY / OPT-OUT (EU "right to object") personalization_opt_out bool DEFAULT false, personalization_opt_out_surfaces text[], social_consent_at timestamptz, social_consent_scope text[], social_signal_provenance jsonb ); -- Vector + lifecycle + cohort + density indexes CREATE INDEX ON npa.user_personalization_profile USING ivfflat (chat_topic_vector vector_cosine_ops); CREATE INDEX ON npa.user_personalization_profile (lifecycle_stage, lifecycle_stage_changed_at); CREATE INDEX ON npa.user_personalization_profile (acquisition_cohort); CREATE INDEX ON npa.user_personalization_profile (signal_density_score); CREATE INDEX ON npa.user_personalization_profile (cluster_id);

Appendix A3 . The 80% with thin signal

For users without Karolis's signal density, the same prompt template runs. The values come from a different place.

The mechanic

Empirical clustering of the paying base produces ~21 cohorts covering 97%. Most are operational segments (lifecycle, language, engagement intensity). A few double as topical archetypes (Investors, Russophone-Lithuanians) whose Layer-B defaults seed cold-start.

A new Lithuanian Pro user lands in a cohort on day 1. The prompt inherits topical defaults from data-rich members of the same cohort. As they chat, react, and read, their own signal gradually overrides the cohort default.

The blend, by signal density

Density 0.0 to 0.4: cluster-inherited values dominate. Better than today's broadcast default.
Density 0.4 to 0.8: 30 / 70 cluster blend. Self-data starts to override.
Density 0.8+: cluster is just a tiebreaker. Karolis is here.

A Russophone-Lithuanian (n=13, 100% Russian language) gets a fundamentally different prompt: Russian-language responses, Russian-language source preferences, regional context that cohort engages with. Same schema, same surfaces, different defaults.

Honest framing: clustering is dominantly operational segmentation, not topical taste partition. It earns its place in the schema for cold-start gating, not as a recommendation engine.