Strategy memo
Personalization at Gnomi.
Gnomi already knows its users. The product just doesn't read what it knows.
April 2026
The gap, in one chat
Five parts of his life all point at one chat question. Gnomi answered as if it knew none of them.
The user
Karolis B.
- 23, Lithuanian. Junior Customer Service Specialist at Swedbank.
- Pro / Annual. Lithuania 2-year promo, paid through 2027-03-26. Auto-renewal off.
- 33 days in. 73-hour onboarding burst on days 1 to 4. Silent since.
- 7-stock watchlist. APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH.
- Reddit and X connected with consent. 235 Reddit rows (63 subreddits), 35 X posts already in production.
"do i get it right that if i invest into spdr spyl acc etf, instead of paying me dividends, it will automatically re-invest that money into more shares of spyl etf?"
2026-03-27 chat. A precise question about accumulating ETFs from a 23-year-old building a long-term position.
Reddit
7 posts in r/dividends, r/dividendgrowth, r/dividendinvesting, r/investing, r/trading212, r/TSLA
Occupation
Works at Swedbank, the largest bank in the Baltics
Declared
"Long-term investing" set to 100%
Watchlist
310 SOFI shares at $15 avg, a position the answer could ground in
Likes on X
Trading 212 marketing about in-app investment Q&A
Today's answer: a beginner-level explainer of "what is an ETF." No reference to his positions, his work, his prior research, or the four other parts of his life that already point at the answer.
The data is already there
The signal is in production. The build is mostly wiring it together.
| What exists today | What's missing |
| 125 DynamoDB tables of user signals (identity, chat, reactions, social) |
A single unified user profile that combines them |
| Postgres recommender, reactions store, share-virality store |
A nightly job that turns raw events into a per-user "story" |
| Azure AI Search with 51M+ articles indexed and queryable in <300ms |
Passing user filters (country, language) into the existing search calls. A parameter, not a build. |
| Gnomi's existing AI chat surface |
A ~500-token "who this user is" injected into every chat call |
No new vector store. No model retraining. No platform rebuild. The data is collected but not yet read at the places users see: chat, push, newsletter, feed.
One row per user
Who he is (changes over years), plus what he's doing now (changes over weeks). Read by every surface.
A single Postgres row, refreshed continuously, read by chat, push, newsletter, and feed. Live context (the chat he's typing right now, the article he just clicked) layers on top at request time.
| Field | Karolis's value |
| Stable identity (half-life: 12 to 24 months) |
| cognitive_archetype | Active-positions retail investor (0.95). Banking professional (0.90). Self-anchoring decision-maker (0.85). |
| watchlist_tickers | APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH (resolved through Quartr lookup, so cross-listings don't mis-resolve) |
| occupation, country | Junior Customer Service Specialist at Swedbank. Lithuania. |
| Active intent (half-life: 14 to 30 days) |
| top_topics | Long-term investing (0.97), Dead by Daylight (0.95), Iran geopolitics (0.85), Lithuanian politics (0.80), side-hustle (0.70). Confidence weighted by how many independent signals fire on the same topic, not by raw volume. |
| per_topic_language_affinity | Finance: English. Lithuanian politics: Lithuanian. Iran: bilingual with Lithuanian framing. Gaming: English. |
| lifecycle_stage | cooling. 73-hour burst on days 1 to 4, silent 30 days since. First push opportunity tied to SOFI's next earnings event. |
| upcoming_events | SOFI Q1, UNH Q1, NVDA Q1. Joinable today via quartr_event (200K rows) and finance_quartr_id_tracking. |
| chat_topic_vector | 3072-dimension fingerprint of his last 50 chat questions. Fed into existing Azure AI Search calls. |
For the ~80% of users with thin signal, signal_density_score gates the read: cohort defaults seed the active-intent fields, the user's own data takes over as it accumulates. Cold-start mechanic in appendix.
The Reddit and X advantage
What we learn about Karolis from his Reddit and X, that his app behavior alone never told us.
| What we know | Why we know it |
| Active retail dividend-investor with current positions | 7 Reddit posts in dividend-investor subreddits. 7-stock watchlist. Voice-chat with explicit "310 SOFI shares at $15 avg." Likes on X for Trading 212 in-app AI chatbot, SoFi USD settlement, mobile-investing privacy. |
| Aspiring Dead by Daylight content creator | 105 of 235 Reddit posts in r/deadbydaylight (32 of them his original posts as BoyDilly). 5 original X posts (4 tagged #dbd) with embedded video links. Bio names "content creator" explicitly. |
| Side-hustle and monetization research | r/forhire, r/NewTubers, r/Fiverr, r/FiverrGigs, r/ShadowBan. Actively researching how to monetize the Dead by Daylight content. |
| Self-anchoring decision-maker | Same SoFi sell question in 2 sessions across 2 days (Mar 26 voice in English, Mar 29 finance-mode in Lithuanian). Plus the day-1 ask: "if i give you information about myself, will you remember that?" |
The advantage. Without the social signal, Karolis is a 7-stock watchlist plus a Lithuanian email. Few consumer news apps build personalization from a user-authorized external profile at signup. Gnomi has the consented connection, the ingest pipeline (already running), and 19,800 X rows plus 4,400 Reddit rows in the database today.
Where this lands first: the chat prompt
About 500 tokens prepended to every chat call. Refreshed nightly. Every line traceable to a specific field.
The user is a 23-year-old Lithuanian working as a Junior Customer Service Specialist at Swedbank, the largest bank in the Baltics. He is a Pro / Annual subscriber on the Lithuania 2-year promo, paid through 2027-03-26. Auto-renewal is already off: he has decided not to extend even though he has 11 months remaining. He joined 33 days ago, used Gnomi heavily for 73 hours across days 1 to 4, then went silent. Lifecycle: active but cooling.
COGNITIVE HABITS (use to disambiguate, prioritize, ground; DON'T reference explicitly):
. Active retail investor with real positions: 7 stocks (APLD, BBAI, NVDA, NFLX, SOFI, SURG, UNH) tilted toward AI infrastructure, fintech, healthcare. Knows specific position sizes (310 SoFi shares at $15 avg). Treat finance questions as research from someone who already holds the names.
. Banking professional learning his trade: pitch price-to-earnings ratios, ETF mechanics, Fed policy at junior-bank-employee level. Precise mechanism, not glossary.
. Bilingual context-switcher: chats in English for finance and global geopolitics; in Lithuanian for Seimas politics, regional news. Match his current language.
. Self-anchoring decision-maker: asks the same question across multiple sessions when working through a decision (the SoFi sell question recurs in 2 sessions, 2 days, 2 languages). Treat as iteration, not ignorance.
LANGUAGE PER TOPIC (override user-global Lithuanian when relevant):
. Finance and portfolio: English.
. Iran geopolitics: bilingual with Lithuanian framing of consequences.
. Lithuanian local politics: Lithuanian.
. Gaming (Dead by Daylight): English.
WATCHLIST (real positions in some):
. SOFI: 310 shares at $15 avg = $4,650 cost basis, has asked twice whether to sell.
. APLD, BBAI, NVDA, NFLX, SURG, UNH (asked for 10-year analyst predictions on UNH: base, bear, bull).
Prioritize his holdings for examples. Surface earnings briefings the day before any held name.
DON'T:
. Reference any of the above facts explicitly. Let it show in disambiguation, not acknowledgment.
. Re-explain ETF basics, dividend mechanics, or price-to-earnings ratios as if for a beginner.
. Refuse public-record finance questions with "I'm not allowed to give financial advice." He is asking for research input, not a fiduciary recommendation.
. Treat finance and gaming as the same person's "interests list." They are separate domains he context-switches between.
No magic. Just plumbing.
Every line of that prompt is computable from the profile. The schema holds the truth; the prompt is just a rendered view of it.
| What the prompt says | Schema field | Underlying signal |
| "Active retail investor with real positions" | cognitive_archetype.primary | Watchlist + voice-chat "310 SoFi shares at $15 avg" + 7 Reddit dividend posts + likes on X for Trading 212 |
| "Banking professional learning his trade" | cognitive_archetype.secondary | Declared occupation + chat questions pitched the way a banking junior would phrase them |
| "Bilingual context-switcher" | per_topic_language_affinity | Chat language detection per topic: English on finance, Lithuanian on Seimas, mixed on Iran |
| "Long-term retail investing: 0.97" | top_topics | 5 independent signals firing on the same topic: declared + watchlist + Reddit + chats + likes on X |
| "He holds 310 SOFI shares at $15 avg" | watchlist_tickers.notes | Verbatim voice-chat extraction |
| "Surface earnings briefings the day before" | upcoming_events | Watchlist joined to quartr_event (200K rows) via finance_quartr_id_tracking |
Each line is a claim, grounded in real signal, with a confidence score and a known source. Computable nightly. Auditable per field. Editable by Karolis if he ever opens a profile-control screen.
What Karolis would feel
Three moments. Same user. Different product.
"is it worth selling my 310 SoFi shares?"
Today
Generic SoFi-the-company intro. No acknowledgment of his position size or prior conversation history.
After this ships
Knows he holds 310 shares at $15 avg, asked the same question 3 days ago in a different language, has liked Trading 212 posts on X. Leads with current price, recent analyst targets, the case for vs. against trimming a flat position, what to watch ahead of next earnings. No refusal.
"How does SPDR SPYL accumulating-ETF mechanics work?"
Today
Beginner-level explainer of "what is an ETF."
After this ships
Junior-bank-employee level: precise accumulating vs. distributing tax mechanics, what SPYL specifically does, how it differs from VUAA (the Vanguard equivalent he might be comparing) and other Lithuania- and EU-listed accumulators.
He returns after 30 days of silence.
Today
Same generic Lithuania-tier push as thousands of other Lithuanian users.
After this ships
Push the day before SOFI Q1 earnings, anchored on his watchlist. Reactivation flag fires. Chat prompt regenerates with his stable identity intact.
The bottom line
He asked us to remember him on day 1. We didn't. He turned off auto-renewal. The data to answer "yes" was already in production.
What it takes to read Karolis: a single profile table, a nightly job, and one place where the profile gets injected into the chat prompt. Every surface that consumes the profile (chat, push, newsletter, feed) gets sharper for free.
The profile table is the foundation. The chat prompt is the first place users feel it.
What this approach cannot do: discover topics Karolis would care about that he has never engaged with, or tell Gnomi what new product to build. It can only help Gnomi be sharper at the products it already ships, for the users it already has.
Appendix A1 . Schema, the shape
npa.user_personalization_profile. One row per user. Read by every surface.
Layer A . Stable identity
- user_id
- country
- app_language
- article_language
- age
- gender
- occupation
- bio
- pro_tier
- pro_industry
- pro_job_title
- pro_categories
- pro_publishers
- pro_schedule
- acquisition_cohort
- utm_subcohort
- profile_photo_uploaded
- watchlist_tickers
- cognitive_archetype
- reddit_connected
- reddit_subreddits_subscribed
- reddit_topical_interests
- x_connected
- x_following_handles
- x_following_topical_clusters
- cluster_id
Layer B . Active intent
- top_topics
- top_publishers_by_topic
- top_categories
- per_topic_language_affinity
- reaction_summary
- cross_source_compare_events
- chat_topic_vector
- chat_volume_30d
- chat_question_examples
- chat_use_mode
- chat_languages
- chat_grounding_rate
- chat_features_used
- chat_peak_hours_local
- cross_modal_topics
- aspirational_topics
- revealed_only_topics
- social_only_topics
- upcoming_events
- reading_dwell_summary
- topic_dimension_preferences
- un_likes
- slow_toggle_signals
- abandoned_topics
- push_categories_engaged
- push_categories_ignored
- push_optimal_send_times
- lifecycle_stage
- feature_first_use
- feature_last_use
Compressed view + privacy
- prompt_text_summary
- signal_density_score
- cluster_blend_weight
- data_freshness
- data_provenance
- user_overridden_fields
- personalization_opt_out
- personalization_opt_out_surfaces
- social_consent_at
- social_consent_scope
Three consumption paths
- A. Inline injection. prompt_text_summary into chat system prompts (~500 tokens).
- B. Vector retrieval. chat_topic_vector queried in Azure AI Search alongside the user's question.
- C. Cohort fallback. When signal_density_score < 0.4, profile reads cluster-inherited values.
Appendix A2 . Schema, the SQL
The actual CREATE TABLE the engineering team would build.
CREATE TABLE npa.user_personalization_profile (
-- IDENTITY
user_id uuid PRIMARY KEY,
profile_version text NOT NULL,
refreshed_at timestamptz NOT NULL,
-- LAYER A: STABLE IDENTITY
country text,
app_language text,
article_language text,
pro_tier text,
pro_payment_cadence text,
purchase_platform text,
device_platform text,
pro_industry text,
pro_job_title text,
pro_categories text[],
pro_sub_topics jsonb,
pro_publishers jsonb,
pro_schedule jsonb,
gender text,
age int,
occupation text,
bio text,
profile_photo_uploaded bool,
acquisition_cohort text,
utm_subcohort text,
watchlist_tickers jsonb, -- resolved through Quartr
cognitive_archetype jsonb, -- {primary, secondary, tertiary, axis_scores, evidence_count, confidence}
reddit_connected bool DEFAULT false,
reddit_subreddits_subscribed jsonb,
reddit_topical_interests jsonb,
reddit_engagement_archetype text,
x_connected bool DEFAULT false,
x_following_handles text[],
x_following_topical_clusters jsonb,
cluster_id text, -- ~21 cohorts cover 97% of paying base
-- LAYER B: ACTIVE INTENT
top_topics jsonb, -- [{topic, confidence, source, languages, evidence_count, last_event}]
top_publishers_by_topic jsonb,
top_categories jsonb,
per_topic_language_affinity jsonb,
reaction_summary jsonb,
cross_source_compare_events jsonb,
chat_topic_vector vector(3072), -- pgvector, text-embedding-3-large
chat_volume_30d int,
chat_question_examples jsonb,
chat_use_mode text,
chat_languages text[],
chat_grounding_rate real,
chat_features_used text[],
chat_peak_hours_local int[],
cross_modal_topics jsonb,
aspirational_topics jsonb, -- declared but not revealed
revealed_only_topics jsonb, -- behavioral but not declared
social_only_topics jsonb, -- in social mirror, not in chat / reactions
upcoming_events jsonb, -- watchlist joined to quartr_event + chat-derived events
reading_dwell_summary jsonb, -- per-topic dwell, completion, repeat visits
topic_dimension_preferences jsonb, -- per-topic 3-level category mix, format mix, sentiment tilt
un_likes jsonb,
slow_toggle_signals jsonb,
abandoned_topics jsonb,
push_categories_engaged jsonb,
push_categories_ignored jsonb,
push_optimal_send_times int[],
lifecycle_stage text,
lifecycle_stage_changed_at timestamptz,
lifecycle_stage_history jsonb,
feature_first_use jsonb,
feature_last_use jsonb,
-- COMPRESSED REPRESENTATION
prompt_text_summary text, -- pre-rendered ~500-token natural-language summary
prompt_text_summary_built_at timestamptz,
-- METADATA + GATING
signal_density_score real,
cluster_blend_weight real, -- 0..1 mix of cluster vs self at read time
cluster_profile_version text,
data_freshness jsonb,
data_provenance jsonb,
user_overridden_fields jsonb,
-- PRIVACY / OPT-OUT (EU "right to object")
personalization_opt_out bool DEFAULT false,
personalization_opt_out_surfaces text[],
social_consent_at timestamptz,
social_consent_scope text[],
social_signal_provenance jsonb
);
-- Vector + lifecycle + cohort + density indexes
CREATE INDEX ON npa.user_personalization_profile USING ivfflat (chat_topic_vector vector_cosine_ops);
CREATE INDEX ON npa.user_personalization_profile (lifecycle_stage, lifecycle_stage_changed_at);
CREATE INDEX ON npa.user_personalization_profile (acquisition_cohort);
CREATE INDEX ON npa.user_personalization_profile (signal_density_score);
CREATE INDEX ON npa.user_personalization_profile (cluster_id);
Appendix A3 . The 80% with thin signal
For users without Karolis's signal density, the same prompt template runs. The values come from a different place.
The mechanic
Empirical clustering of the paying base produces ~21 cohorts covering 97%. Most are operational segments (lifecycle, language, engagement intensity). A few double as topical archetypes (Investors, Russophone-Lithuanians) whose Layer-B defaults seed cold-start.
A new Lithuanian Pro user lands in a cohort on day 1. The prompt inherits topical defaults from data-rich members of the same cohort. As they chat, react, and read, their own signal gradually overrides the cohort default.
The blend, by signal density
- Density 0.0 to 0.4: cluster-inherited values dominate. Better than today's broadcast default.
- Density 0.4 to 0.8: 30 / 70 cluster blend. Self-data starts to override.
- Density 0.8+: cluster is just a tiebreaker. Karolis is here.
A Russophone-Lithuanian (n=13, 100% Russian language) gets a fundamentally different prompt: Russian-language responses, Russian-language source preferences, regional context that cohort engages with. Same schema, same surfaces, different defaults.
Honest framing: clustering is dominantly operational segmentation, not topical taste partition. It earns its place in the schema for cold-start gating, not as a recommendation engine.