Skip to content
codioful-formerly-gradienta-bKESVqfxass-unsplash
Anita MoorthyDec 11, 2025 3:14:11 PM9 min read

Part 1: 7 Reddit Factors That We Tested for LLM Relevance and its Results

Part 1: 7 Reddit Factors That We Tested for LLM Relevance and its Results
11:05

Executive summary

Reddit has emerged as the most influential content source for large language models, representing 40% of all LLM citations as of mid-2025. This guide explores whether specific factors influence if a Reddit thread gets surfaced by ChatGPT, Google AI Overviews, Perplexity, and other AI tools—providing marketers with some directional guidance for strategic Reddit engagement.

Key Finding:

Of all the factors we looked at including relevancy, popularity, subreddit size, the only thing we can conclude is that the LLMs will cite a reddit thread if it matches the query relevance of the user regardless of other factors.

 

This is part 1 of a 3-part Reddit marketing series for marketers interested in reddit for LLM visibility. 

Part 2 outlines a step-by-step plan for getting started on Reddit for LLM visibility

Part 3 describes exactly how to comment and post with examples

 

A note on credibility (and why this post is careful)

This post mixes:

  • Supported claims (factors that have other citations confirming the same thing we observed)

  • Observed patterns from small-scale experiments we ran

  • Hypotheses that are plausible but not proven without larger sampling

 



The Data: Reddit's Unpredictable Influence

According to recent studies Reddit isn’t just a place to market to humans—it’s one of top two sources for most large language models (LLMs) like ChatGPT, Claude, Gemini, and Perplexity. These AI tools now regularly quote Reddit threads in their answers, meaning showing up on Reddit can get your brand surfaced automatically in future AI responses.



Important Update: There has been a recent citation pattern shift (September 2025)

  • According to one source, PromptWatch, Reddit citations fell from 9.7% in previous month to 2% in September.
  • Likely cause: Google removed the num=100 parameter from search results, making it harder for OpenAI to crawl using live search.
  • OpenAI may be using Reddit's API more (which costs more) vs. free Google crawling
  • Wikipedia citations increased as Reddit citations decreased


What this means for marketers: Citation patterns are volatile and platform-dependent. Diversify your strategy across multiple LLM sources rather than optimizing for one platform.

 

Why is Reddit important to prioritize for marketing?

Even with the volatility of citation patterns, Reddit will continue to play an important role in training LLMs and providing answers in real time search. There are a few reasons for this:

  • Google ↔ Reddit data pipe. Google has licensed Reddit’s Data API to get real-time, structured content and “enhanced signals” it can display, train on, and otherwise use (including AI Overviews). This is why Reddit often appears in Google’s generative answers. Read more: blog.google

  • OpenAI ↔ Reddit data pipe. OpenAI also licensed Reddit’s real-time Data API so ChatGPT can better understand and showcase Reddit content—especially on recent topics when it decides to search. Read more: OpenAI

  • Many of the top subreddits are well moderated and so the quality of engagement and responses to real user queries is filled with useful information around specific use cases and which tools are appropriate which are useful in answering the long tail of questions users ask LLM answer tools.

  • Other engines. Perplexity searches the web in real time and always shows citations; Claude can use web search too (commonly via Brave), so recent/indexable Reddit content is eligible there as well. Read more: Perplexity

So if a Reddit thread is public, indexable, and a good match to the query, it has a path to show up.





The 7 reddit factors that we tested and our conclusions

 

1. Exact topical match to the user’s query (supported claim)

There is evidence for several academic papers and a test carried out by Semrush which proves that LLMs using RAG systems match queries against content based on semantic similarity and intent alignment. (See Wang et al. (2024). A Multi-Granularity Matching Attention Network for Query Intent Classification. Research on query intent matching in information retrieval.)

That means threads that clearly solve the intent (e.g., “Which Salesforce email-finder alternatives work for direct dials in EMEA?”) beat vague titles and descriptive titles. Further, our experiments on recency and experiments running different queries surfaced both new threads and threads with as little as 4 comments confirming that upvotes and age are not as important as query and intent alignment.

How to operationalize 

  • Write titles as question + context (verb, object, constraint).
  • Put the “answer” or the query with specifics like setup, steps, shortlist, trade-offs etc at the top of your comment.
  • Frame contributions to match decision-making queries your ICP asks
  • See Part 3 of the Reddit guide series for more guidance on commenting and posting

2. Entity-rich, experience based that the model can quote (supported)

LLMs disproportionately like content they can quote verbatim or near-verbatim: tools, versions, configs, ICPs, constraints, metrics, datasets, regions, budgets, timelines. Research on RAG systems shows that models prioritize content with specific entities, metrics, and verifiable claims over general observations. (Gao et al. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey.)

How to operationalize

  • Name tools and versions explicitly (e.g., "Salesforce Sales Cloud Enterprise Edition")
  • Include numbers and outcomes (e.g., "reduced email bounce rate from 8% to 2.1%")
  • State when it worked and for whom (e.g., "for B2B teams with 20-50 sales reps")
  • Provide specific constraints and contexts (e.g., "with a $50K annual budget")

3. Indexability & safety compliance (supported)

To be surfaced, a thread must be public, crawlable, and safe. NSFW/restricted/private content or heavy self-promo can be down-ranked/omitted in AI answers. Google’s AI features show snapshots with links but apply quality/safety filters; surfaced items also tend to overlap with what ranks organically. (Wikipedia (2025). Retrieval-augmented generation. Overview of RAG systems and external knowledge integration.)

How to operationalize

  • Prioritize public subreddits; avoid gated or sensitive topics that trigger safety systems.

 

4. Subreddit health/scale (no correlation)

There was no correlation between active, large subs and smaller subs. Both showed up in the queries we ran in ChatGPT.

How to operationalize

  • To maximize your impact on Reddit, pick a few subreddits that are active and have a volume of discussions your topic and focus on growing your visibility and credibility there so you can comment and post without getting taken down.

Want to figure out the right subreddits for your business?

Let Rocksalt help you find the best subreddits that have your buyer queries and are cited by LLMs.


5. Engagement quality (no correlation)

Across all our experiments we found thread age, votes, and comment counts were weak predictors across engines; a low-vote but high-signal thread can still surface if it nails intent and is indexable.

How to operationalize

  • Prioritize clarity + specificity over chasing karma. One excellent, up-to-date top comment often beats a long, meandering thread.

6. Recency signals (supported for time-sensitive topics)

While recency in general does not seem to matter with many threads that are older than 1 year sometimes being cited, because Google and OpenAI have real-time pipes to Reddit, recent posts/comments can be discovered and cited when the topic evolves quickly (APIs change, pricing, outages, tactics). Read more: blog.google

In our experiments, brand-new thread appeared in Perplexity almost immediately; other models varied depending on whether they searched.

How to operationalize

  • Update top comments when facts change; add time markers (“As of Oct 2025…”).
  • Prompt comment activity (clarifications, follow-ups) to refresh freshness.

7. Funnel stage fit (Hypothesis)

In our experiment feeding tofu/mofu/bofu queries for specific topics, Reddit showed up primarily for MOFU/BOFU queries (comparisons, trade-offs, troubleshooting, “which tool for X constraint”). Broad TOFU questions were often answered from static training data instead.

This is not something we have seen in academic research or other research and tests so at this point, this is something you should take as directional guidance.

undefined-Nov-24-2025-10-27-49-2024-AM


How to operationalize

  • Focus your reddit efforts more on answering and seeding content that addresses what buyers might be asking for when they are sure they need a solution to the problem and are evaluating tradeoffs, implementation factors, pricing and comparisons of different tools.


 


Forbidden
Common Misconceptions Debunked


Myth 1: "More upvotes = guaranteed citation"

Reality: Our research found cited threads with as few as 0-10 upvotes. Topical relevance and specificity matter more than engagement signals.


Myth 2: "Brand accounts should post on Reddit"

Reality: Reddit users are highly skeptical of brand accounts. Individual thought leaders and practitioners get more engagement and trust. Your team members posting authentically is more effective than a branded presence. Once you grow in size and have a lot of users and a community, then having a brand account to communicate to your users makes sense.

Myth 3: "New threads won't get cited quickly"

Reality: Search-enabled LLMs like Perplexity can cite content within 24 hours if it matches query intent and has some engagement. Speed to citation varies by platform.

Myth 4: "Only viral threads get cited"

Reality: Niche, specific threads with 10-20 comments often get cited over massive viral threads if they better match the query intent. Quality and relevance beat popularity.

Myth 5: "Citations only come from training data"

Reality: Modern LLMs use RAG (Retrieval-Augmented Generation) to search the live web. Your Reddit content from last week can be cited today if it ranks well in search results.






Key Takeaways for Marketers

Reddit citations in LLMs are not random: specific patterns around query intent, content structure, and recency.

   Focus your efforts on MOFU and BOFU content: where LLMs actively search for and cite Reddit discussions. Skip TOFU where training data dominates.

   Be specific and detailed: tool names, metrics, constraints, and real-world context are what LLMs need to quote your contributions.

   Authenticity beats promotion: balanced, experience-driven content dramatically outperforms sales-focused posts.

    Platform behaviors vary significantly: Perplexity loves Reddit, ChatGPT prefers Wikipedia, Google AI Overviews balances multiple UGC sources. Diversify your strategy.

   Engagement metrics are unreliable predictors: topical relevance and subreddit quality matter more than upvotes or comment counts.

  Citation patterns are evolving rapidly: what worked in June 2025 may not work in October 2025. Stay adaptive and monitor changes.

  This is a long-term strategy: building authority and citation presence takes consistent, authentic participation over months, not days.



 

What's Next

This guide covered what factors influence whether Reddit content gets cited by LLMs. Future parts of this series will address:

    Part 4: Reporting and measuring Reddit activity for LLM visibility





Appendix

 

Experiment: Reddit Appearance Summary by Funnel Stage

Setup: 

  • Simulated real buyer research by asking ChatGPT and Google AI Overviews (in incognito) a range of LinkedIn thought-leadership queries across top, middle, and bottom of funnel stages.

  • Logged when Reddit appeared in responses, noting the subreddit, engagement (upvotes/comments), and thread characteristics to identify patterns in what LLMs cite.

 

Funnel
Stage

Total Queries

Reddit
Results (“Y”)

Dominant Subreddits

Median Upvotes

Median Comments

Reddit
Appearance Rate

Top of Funnel
Awareness / Problem Discovery

11

0

0%

Middle of Funnel
Operator Questions, Workflows, Stack Design

11

4

r/marketing, r/startup, r/sideproject

16.5

7

36%

Bottom of Funnel Features, Compliance, Pricing, Troubleshooting

11

3

r/marketing, r/growthhacking, 

17

9

27%




COMMENTS

RELATED ARTICLES