When AI Meets Astrology: Tackling the Challenges of Hyper-Personalization

Alexander Abramovich
10 min readNov 11, 2024

--

Co-authored with Fima Rotter and Yana Pshevoznitskaya.

The Seeds of Personalization

The idea of a personalized digital assistant has been in the public imagination for decades. Science fiction gave us visions like J.A.R.V.I.S. (Just A Rather Very Intelligent System) in Iron Man, while the tech industry introduced PDAs (Personal Digital Assistants) like the Palm Pilot and Apple Newton in the 1990s. These early attempts at putting computerized assistance in our hands set the stage for the personalization revolution we’re experiencing today.

My product journey into the world of personalization began about six years ago when I co-founded Itini with Fima Rotter. Itini was a travel planner focused on creating tailored experiences in Thailand. We built it on the premise that each traveler has unique interests and needs. Our goal was to solve the paradox of choice in travel planning by offering a personalized travel designer that generates the trip according to the traveler’s preferences.

We developed algorithms that craft multi-city personalized itineraries in under 10 seconds, considering factors like travel style, transportation, budget, and desired activities. This was a significant achievement at the time, allowing us to provide customized travel plans much faster than traditional methods.

However, we couldn’t easily accommodate a request like “I want to visit a cigar shop in Bangkok which is close to an Italian restaurant.” Like all the other products at that time, we were mostly limited to UI with pre-defined categories with button-pressing and list-selecting interfaces of the past.

From Personalization to Hyper-Personalization

The rise of GenAI (Generative AI) has ushered in an era of “hyper-personalization”. Several key features characterize this new paradigm:

  • Natural Language Interaction and Real-time Contextual Understanding: Users can now express their desires in everyday language, and AI systems can instantly interpret and act on these complex, multi-faceted requests.
  • Unprecedented Granularity: We can now accommodate highly particular user preferences in ways that were considered impractical earlier. The AI can understand and respond to nuanced requests that don’t need to fit into predefined categories.
  • Predictive Personalization: Advanced AI can anticipate user needs based on patterns and subtle cues, often before the user explicitly expresses them. This proactive approach takes personalization to a new level.
  • Adaptive Learning: These systems continuously learn from interactions, refining their understanding and responses to provide increasingly accurate and relevant personalization.

A Stepping Stone in Wellness Coaching

Excited by these new possibilities, Fima Rotter, Yana Pshevoznitskaya and I embarked on a new adventure. We wanted to create a digital companion that could offer deeply personalized guidance and support. This vision led us to create Stellium, an AI-powered astrology bot designed for wellness coaching.

Stellium is an important stepping stone toward utilizing the potential of Generative AI. It’s designed to provide real-time, hyper-personalized guidance through AI-powered astrology, focusing on well-being and day-to-day decision-making.

Our product approach is to offer a light touch of well-being and wellness, intertwined with astrological insights. We position astrology as a barometer for life’s ebbs and flows — much like a weather forecast might suggest carrying an umbrella, Stellium offers actionable insights with an implied degree of probability. The stars may offer a hint but don’t insist, leaving room for free will and personal interpretation.

Hyper-Personalization Solution Framework

We decided to create a bot rather than a standalone app, and chose Telegram as our platform for several strategic reasons:

  • Large existing user base with easy one-click bot access
  • Built-in payment infrastructure for efficient unit economics validation
  • Natural conversational format enabling seamless feedback collection
  • Rich social features allow users to invite friends and share experiences
  • Complete access to chat history for sentiment analysis and user behavior insights
  • Ability to create user communities for direct engagement and feature testing
  • Cross-platform accessibility without requiring a separate app installation

With the platform selected, we faced several key challenges:

  • Data Collection and Privacy: We gather plenty of personal information to provide personalized insights honoring users’ privacy.
  • Content Generation: The evaluation of creating proprietary astrological content or relying on general LLM (e.g. ChatGPT) knowledge has far-reaching monetization and development implications.
  • User Onboarding: We need to make the initial data collection process (e.g. birth date and place) welcoming and engaging, rather than feeling like a mundane questionnaire.
  • Technical Implementation: The right kind of AI architecture to allow the level of personalization and cost-efficiency we envisioned.
  • User Trust: we need to present astrological insights in a way that respects our users’ intelligence and agency.

Let’s explore how we shaped our technical and product decisions.

Cost Optimization and Choosing the Right AI Model

We aim to deliver a sophisticated, personalized experience while keeping our unit economics viable. The solutions emerged from breaking down our product into distinct functional components, each with specific computational and cost requirements.

First, we developed a multi-agent architecture where different AI models handle specific aspects of the user experience:

  • Onboarding Agent: A simpler, more efficient model for gathering initial user information.
  • Astrology Calculator: A specialized model for astrological calculations.
  • Insight Generator: The most sophisticated model for creating personalized insights and guidance.
  • Retention Agent: Handles daily reminders and announcements.
  • User Preferences Gathering Agent: Collects the most updated value of each preference (such as current location, most recent activities, life events, what they like or don’t like, etc).

We use GPT-4o for complex tasks like generating nuanced astrological insights. GPT-4o-mini and self-hosted custom agents did a great job for simple interactions such as onboarding.

You might wonder how the agents inter-operate; we will elaborate on context sharing later.

A significant challenge lies in providing astrological context to our models. Each interaction requires injecting relevant astrological data — planetary aspects, natal chart positions, current transits — into the prompt. This can quickly consume a large portion of our context window and increase costs.

Aside from the multi-agents, we have identified several cost optimization strategies that could be implemented as our user base grows:

  • Fine-tuning: Training models on our specific use cases can reduce costs for specialized tasks.
  • Self-hosting: Particularly effective for simpler models handling routine tasks.
  • RAG optimization: Efficient retrieval strategies to minimize token usage.
  • Modular prompting: Splitting complex tasks between specialized agents reduces prompt size and improves response quality.

Memory management and context sharing are key to agents’ seamless cooperation, which brings us to our next challenge.

Memory Management and Content Selection with RAGs

Let’s start with a simple example. Many users who began exploring GenAI through conversational interfaces like ChatGPT or Claude believe these models should ‘remember’ everything typed in a chat. This expectation may lead to frustration when the AI acts forgetfully.

In reality, models themselves operate on a simple request-response approach. The model’s memory is solely based on the conversation history included in each interaction. To generate a response, the model uses a context window — think of it as the model’s working memory, containing both the current request and relevant previous conversation. The size of the context window imposes 2 limitations:

  • Technical Limitation: All current AI models have a fixed context window, meaning they can only consider a certain amount of previous text when generating responses.
  • Cost Consideration: Sending the entire conversation history with every interaction would quickly become prohibitively expensive as conversations grow longer.

For example, ChatGPT appears to have an ‘unlimited’ chat size because it selectively includes relevant parts of the conversation history in each response. It uses internal mechanisms to decide which information to retain, so selective ‘forgetfulness’ is naturally permissible. On the other hand, Claude tends to remember more within the same chat session, but it has a limited chat size, requiring users to start a new chat when the limit is reached.

To address memory limitations, we need a way to intelligently select and store the parts of conversation history that are most relevant to the current context. This is where semantic search comes in — instead of keeping everything or using simple keyword matching, we can find information based on meaning. Vector databases enable this by storing text and its mathematical representation (vector), where similar content has similar vectors. This allows us to retrieve the most contextually relevant pieces of past conversations efficiently.

RAG (Retrieval-Augmented Generation) is a methodology that builds upon this concept. It retrieves relevant information from a knowledge base and uses it to augment the AI’s context before generating responses. This can be implemented using vector databases, traditional databases with vector capabilities, or even simpler storage solutions.

For Stellium, we plan to implement RAG to manage essential user details. Let’s look at an example. Say a user casually mentions they play guitar in a rock band and have a golden retriever named Max. Our system stores these facts as categorized snippets that capture their meaning — like activity:regular: creative plays guitar in rock band, connection:pet:trustful: golden-retriever Max.

Later, when the user complains about feeling stressed after work, we search for relevant personal information. LLMs like GPT already know about stress relief activities and generate, when properly instructed, RAG-adapted representations like: activity:stress_relief:high: tai chi, activity:stress_relief:high: playing music, and connection:stress_relief:high: trusted friend. This approach allows us to provide a personalized response: “Why not jam with your band tonight? Or take Max for a long walk — golden retrievers love that, and both music and dog walking are proven stress relievers!

Example: When a user mentions they’re a Libra interested in career advice, our system will store these key pieces of information as vectors. Later, when they ask, “What should I focus on this week?” We’ll retrieve the most relevant context from their previous conversations, allowing us to provide career-focused advice tailored to Libra's traits without repetitive questioning.

What’s particularly exciting is that RAG memory can be shared and propagated between our agents. It’s like talking to a group of specialists who perfectly share their knowledge about you. Whether chatting with our onboarding agent or astrological insights agent, the conversation feels seamless, as if you’re talking to a single, well-informed advisor.

While managing memory and context was one challenge, ensuring consistent responses from these non-deterministic models presented another set of interesting problems.

Iterative Prompt Improvement: Debugging, Testing, and Adapting

We learned the importance of continuous prompt refinement. The non-deterministic nature of large language models means that the same prompt can sometimes yield different results, making debugging and optimization a constant process. Here are typical examples:

  • Location ambiguity: Bot: “Let’s get to know each other! Share your birth date, birthplace, and current city.” User: “Paris” Bot: [fails to clarify which location this refers to]
  • Inconsistent formality: French user: “Bonjour” Bot: “Je vous souhaite la bienvenue!” [Same conversation] Bot: “Tu devrais essayer…”
  • Conversation drift: User: “What do you think about the latest iPhone?” Bot: [engages in tech discussion instead of wellness topics]

Because LLMs are non-deterministic, the same prompt can produce different results each time. This unpredictability makes traditional debugging challenging, as both ‘tu’ and ‘vous’ (the informal and formal forms of ‘you’ in French) are valid outputs, but consistency is desired. We found ourselves regularly tweaking prompts based on the interactions we observed and the feedback we received from users. For instance, we updated the prompts to ensure consistent use of ‘tu’ and ‘vous’ (known as the T-V distinction).

To tackle these challenges systematically, let’s look at the key concepts and tools in modern LLM development:

  • Prompt Evaluation: Using one LLM to assess another’s outputs against specific criteria (like maintaining a formal address or staying on topic)
  • Test Cases: Real conversation logs transformed into test scenarios with expected behaviors
  • Response Templates: Defining acceptable response patterns while allowing for natural variation
  • Metrics Collection: Tracking success rates, response times, and token usage
  • A/B Testing: Comparing different prompt versions across user segments
  • Automated Monitoring: Detecting drift in model behavior over time

Tools like Langfuse, LangKit, and Helicone help implement these concepts by providing:

  • Prompt version tracking and performance comparison
  • Real-time telemetry and response pattern analysis
  • Test suite creation from conversation logs
  • Visual interfaces for prompt experimentation
  • Integration with existing development workflows

For example, we can write a test: “Given a conversation in French where the user is over 60, verify that the bot consistently uses the ‘vous’ form and maintains wellness-focused responses.

This systematic approach to prompt engineering, supported by emerging tools and frameworks, helps us maintain consistency in an inherently non-deterministic environment.

Looking Forward

Stellium represents our step into AI-driven wellness coaching and is a stepping stone toward an even more sophisticated AI-driven wellness toolset. What we’ve learned about multi-agent architectures, memory management, and prompt engineering has applications far beyond astrology — from mental health support to personal development coaching.

The challenges we tackled — from managing conversation context to ensuring consistent interactions across languages — reflect broader challenges in building personalized AI applications. While we’re still in the early stages, each interaction teaches something new about human-AI communication and trust building.

The future holds exciting possibilities for nuanced personalization and meaningful AI interactions. As we move forward, we’re not just building a product — we’re participating in the evolution of how technology can enhance human well-being. It’s a responsibility we take seriously and an adventure we’re thrilled to be on.

--

--

Alexander Abramovich
Alexander Abramovich

Written by Alexander Abramovich

Product Executive/Advisor, People Person

No responses yet