Battle of the Titans: ChatGPT, Perplexity, and Grok — Which AI Deep Research Assistant Reigns Supreme in 2025?

9 Min Read

What is Deep Research?

ChatGPT has introduced Deep Research functionality recently. In fact, it is not the first one in the industry. GenAI tools like Gemini, Grok and Perplexity also provide such a function. Deep Research is an agent that employs reasoning to process huge amounts of online data and perform complex, multi-step research tasks. You may want to know how much you have to pay for these services. Let me show you here first.

AI Tool	Subscription Tier	Monthly Price	Annual Price	Deep Research Queries
Gemini	Google One AI Premium	$19.99	$199.90	Unlimited
Grok	Super Grok	$30	$300	Unlimited
Perplexity	Pro	$20	$200	5 per day
ChatGPT	Plus	$20	$240	10 per month
ChatGPT	Team	$25 per user	$300 per user	10 per month
ChatGPT	Pro	$200	$2,400	120 per month

Of course, my aim is not to show you the fees for using the services in this post. In fact, I tested and compared the deep research functions of ChatGPT, Grok, and Perplexity. I used another tool, Claude (Sonnet 3.7) to help to do the comparison. I will also share my own comments of these three tools at the end of this post.

How Did I Do the Comparison among ChatGPT, Grok and Perplexity?

I used ChatGPT, Grok and Perlexity to perform deep research on three topics:

Intermittent fasting
The use of AI in architectural design
Thermal comfort of street canyons

These three topics were chosen on purpose. The first one is a more general topic related to our everyday lives (and my life, I need to lose weight =p). The second one is more about a professional field (construction industry). The third topic is a problem being investigated in the academia. For the sake of testing, I do not add any details to the questions. I just asked ChatGPT, Grok and Perplexity to give me a comprehensive summary of each of the topics.

Comparison Framework

It is also important to have a framework for doing the comparison. I asked Claude to adopt the following criteria when performing the comparison.

Depth of analysis – How thoroughly does the model explore the topic beyond surface-level information?
Factual accuracy – Is the information provided correct and up-to-date based on the model’s knowledge cutoff?
Source diversity – Does the model draw from varied perspectives and types of sources?
Logical reasoning – How well does the model connect ideas and develop coherent arguments?
Knowledge gaps acknowledgment – Does the model appropriately recognize when it lacks information?
Contextualization – How effectively does the model place information within broader relevant contexts?
Bias identification – Does the model recognize potential biases in its analysis?
Synthesis capability – How well does the model integrate multiple concepts into a cohesive analysis?
Nuance recognition – Does the model acknowledge complexities and avoid oversimplification?
Practical utility – Is the response from the model practical?

I also asked Claude to analyse the strength and limitation of the responses, as well as giving a rating (from 1-10) of each response from the deep research GenAI tools.

So, what are the results?

Here are the results of the comparison, by Claude. I also put the deep research responses here so that you can perform further comparison.

Analysis of Three Deep Research Functions on Intermittent Fasting

Perplexity (PERP) - Rating: 8.5/10

Strengths:

Exceptional depth of analysis with a structured academic approach
Strong scientific rigor with detailed explanations of physiological mechanisms
Excellent citation of specific research findings with numeric data (e.g., “reduced by 13% (95% CI: 5-21%)”)
Clear acknowledgment of controversies (e.g., 2024 cardiovascular mortality study)
Sophisticated synthesis of multiple research streams
Academic writing style with subject headings and organized sections

Limitations:

Some sections are highly technical, potentially limiting accessibility
Occasionally dense with medical terminology that might overwhelm non-specialist readers
Citation style makes it harder to distinguish which statements come from which source

Grok - Rating: 6.5/10

Strengths:

Very accessible presentation with clear “Key Points” summary upfront
Straightforward language appropriate for general audiences
Good balance of benefits and risks
Practical considerations for different populations
Useful table summarizing key aspects
Transparent about limitations of research evidence

Limitations:

Less detailed on physiological mechanisms and scientific explanations
Fewer specific research findings and statistical data
Less comprehensive coverage of biological processes like autophagy
Limited exploration of nuanced considerations for special populations
Less rigorous in differentiating levels of evidence quality

ChatGPT - Rating: 7.5/10

Strengths:

Comprehensive coverage with excellent practical implementation advice
Clear organization with logical flow between topics
Strong emphasis on practical considerations and how to implement IF safely
Good balance between scientific explanations and actionable information
Thorough exploration of different IF methods with pros and cons
Thoughtful discussion of how IF compares to other dietary approaches

Limitations:

Citation format makes source verification difficult
Sometimes repetitive across sections
Less precision in presenting specific research measurements and statistics
Some scientific assertions lack detailed supporting evidence
More anecdotal elements compared to Perplexity’s research focus

Overall Comparative Assessment:

Depth of analysis: Perplexity > ChatGPT > Grok
Perplexity provided the most thorough scientific examination, while Grok offered a more surface-level overview.
Factual accuracy: All three appeared generally accurate, though Perplexity offered the most precise data points.
Source diversity: Perplexity > ChatGPT > Grok
Perplexity demonstrated the broadest range of research sources.
Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s analysis showed sophisticated connections between concepts.
Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT was particularly transparent about what remains unknown.
Contextualization: ChatGPT > Perplexity > Grok
ChatGPT effectively placed IF within broader lifestyle considerations.
Bias identification: Perplexity > ChatGPT > Grok
Perplexity most clearly identified potential biases in studies.
Synthesis capability: Perplexity > ChatGPT > Grok
Perplexity demonstrated superior integration of multiple scientific concepts.

Each research function has different strengths that might suit different use cases: Perplexity for academic-level analysis, Grok for quick accessible overviews, and ChatGPT for balanced information with practical guidance.

Response from Perplexity | Response from Grok | Response from ChatGPT

Analysis of Three Deep Research Functions on Urban Street Canyon Thermal Comfort

Perplexity (PERP) - Rating: 9/10

Strengths:

Exceptional academic rigor with precise technical language
Comprehensive organization with logical section progression
Excellent quantitative data (e.g., “increasing H/W from 0.9 to 1.5 reduced PET by up to 8°C in summer”)
Sophisticated integration of multiple factors affecting thermal comfort
Detailed regional case studies across different climate types
Strong evidence-based design recommendations with specific thresholds (e.g., “H/W ≥ 1.5 in hot climates”)
Clear citations linked to specific claims

Limitations:

Technical density may reduce accessibility for non-specialists
Some specialized terminology (e.g., “3DGI”) introduced without full explanation
Could benefit from more discussion of implementation challenges

Grok - Rating: 7/10

Strengths:

Accessible presentation with clear “Key Points” summary upfront
Balanced overview of major factors influencing street canyon comfort
Effective use of tables to summarize findings on aspect ratios and materials
Inclusion of interesting “Unexpected Detail” on air quality connections
Good organization with clear section headings
Plain language explanations of technical concepts

Limitations:

Less quantitative data compared to Perplexity’s analysis
Limited discussion of thermal comfort indices (PET, UTCI)
Fewer specific design recommendations with measurable thresholds
Weaker explanation of underlying physical mechanisms
Less comprehensive discussion of seasonal variations
Citations presented without clear connection to specific claims

ChatGPT - Rating: 8.5/10

Strengths:

Extremely comprehensive coverage of all relevant aspects
Excellent explanations of fundamental principles with appropriate context
Thorough discussion of thermal comfort indices (PMV, PET, UTCI)
Strong pedagogical approach that builds understanding progressively
Balanced coverage of research methodologies and knowledge gaps
Detailed analysis of vegetation benefits with multiple cooling mechanisms
Well-articulated limitations and nuances (e.g., albedo paradox)

Limitations:

Considerably longer than necessary, with some redundancy
Citation format makes source attribution difficult to track
Some paragraphs could be more concise without losing information
Occasionally less precise with quantitative data compared to Perplexity
Less structured design recommendations than Perplexity

Overall Comparative Assessment:

Depth of analysis: Perplexity = ChatGPT > Grok
Both Perplexity and ChatGPT provided exceptionally thorough analyses, while Grok offered a more general overview.
Factual accuracy: All three appeared accurate, with Perplexity providing the most precise quantitative data.
Source diversity: Perplexity > ChatGPT > Grok
Perplexity showed the most diverse range of research sources.
Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s reasoning flow most effectively connected different aspects of the topic.
Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT included a dedicated section on research gaps and limitations.
Contextualization: ChatGPT > Perplexity > Grok
ChatGPT excelled at placing concepts in broader context and explaining their significance.
Bias identification: ChatGPT > Perplexity > Grok
ChatGPT most effectively addressed conflicting findings and limitations of research.
Synthesis capability: Perplexity > ChatGPT > Grok
Perplexity most elegantly synthesized multiple complex factors.
Nuance recognition: ChatGPT > Perplexity > Grok
ChatGPT provided the most nuanced discussion of trade-offs and context-specific factors.
Practical utility: Perplexity > ChatGPT > Grok
Perplexity provided the most specific, actionable design recommendations.

Each model demonstrated different strengths: Perplexity excelled in academic rigor and precise recommendations, ChatGPT provided outstanding explanations and contextual understanding, while Grok offered accessibility and clarity for non-experts. The ideal approach might combine Perplexity’s precision, ChatGPT’s comprehensive explanations, and Grok’s accessibility.

Response from Perplexity | Response from Grok | Response from ChatGPT

Analysis of Three Deep Research Functions on AI in Architectural Design

Perplexity (PERP) - Rating: 8.5/10

Strengths:

Exceptional organization with clear academic structure and logical flow
Sophisticated coverage of AI applications in architecture with distinct categorization
Strong integration of case studies with specific metrics (e.g., “reducing HVAC loads by up to 30%”)
Excellent discussion of ethical considerations and professional implications
Well-articulated future directions section with thoughtful insights
Precise language that balances technical accuracy with accessibility
Strong citation linking throughout the document

Limitations:

Some sections (like urban planning) could benefit from more detailed examples
Limited exploration of challenges or potential drawbacks of AI implementation
Could have provided more comparison between different AI tools’ capabilities

Grok - Rating: 7/10

Strengths:

Very accessible presentation with clear “Key Points” summary upfront
Effective organization with distinct application categories
Good use of specific examples and case studies (e.g., Shanghai Tower, Wembley Park)
Inclusion of a useful tools table with descriptions and URLs
Clear statistics on tool adoption (e.g., “ARCHITEChTURES used in 140+ countries by 15,000+ users”)
Balanced coverage that acknowledges AI complements rather than replaces architects

Limitations:

Less analytical depth compared to Perplexity and ChatGPT
Limited explanation of underlying AI technologies and mechanisms
Fewer specific metrics on performance improvements
Less exploration of ethical considerations and future implications
Citation format makes it difficult to trace specific claims to sources

ChatGPT - Rating: 9/10

Strengths:

Comprehensive coverage with exceptional detail across all architectural applications
Excellent explanations of how AI technologies function in each context
Outstanding integration of real-world examples and case studies
Strong section on emerging trends with thoughtful analysis of future directions
Clear articulation of both benefits and limitations of AI in architecture
Well-structured content that builds logically from design tools to implementation
Balanced perspective on AI as augmentation rather than replacement for architects

Limitations:

Significantly longer than necessary with some repetitive information
Citation format makes it difficult to trace specific claims to sources
Some sections (like building performance analysis) could be more concise
Occasional overreliance on examples from the same sources

Overall Comparative Assessment:

Depth of analysis: ChatGPT > Perplexity > Grok
ChatGPT provided the most thorough explanation of AI applications and their implications.
Factual accuracy: All three appeared accurate, with varying levels of detail.
Source diversity: ChatGPT > Perplexity > Grok
ChatGPT demonstrated the broadest range of examples and case studies.
Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s analysis showed particularly clear logical progression.
Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT was most transparent about limitations of current AI applications.
Contextualization: ChatGPT > Perplexity > Grok
ChatGPT excelled at placing AI tools within the broader architectural workflow.
Bias identification: Perplexity > ChatGPT > Grok
Perplexity most clearly addressed algorithmic bias concerns.
Synthesis capability: ChatGPT > Perplexity > Grok
ChatGPT demonstrated superior integration of multiple AI applications into a cohesive narrative.
Nuance recognition: ChatGPT > Perplexity > Grok
ChatGPT provided the most nuanced discussion of AI’s role alongside human architects.
Practical utility: Grok > ChatGPT > Perplexity
Grok’s tools table and practical examples offered the most immediately actionable information.

Each model demonstrated distinct strengths: ChatGPT provided exceptional comprehensive coverage and detailed explanations, Perplexity offered strong academic structure with precise analysis, and Grok delivered accessible practical information. The ideal approach would combine ChatGPT’s comprehensive detail, Perplexity’s logical organization, and Grok’s accessible presentation with practical examples.

Response from Perplexity | Response from Grok | Response from ChatGPT

What do I think about the results?

Until now, I’ve just used AI to review AI’s. We may also want to take a look at the results ourselves. In fact, I think that all the tools provide good results. However, there are still differences among them. Claude also spots these differences.

Perplexity – It tends to extract more numbers when I ask Perplexity to perform deep research. This will be more useful for folks who work in academia. I can get more concrete ideas about the results of different studies by having more numbers.
ChatGPT – Usually, the response from ChatGPT is longer compared to the other two. Claude said that there could be redundancies. However, ChatGPT usually gives more information than I originally think of. It tends to give me more background information. ChatGPT is perfect if I want a high-level overview of a topic. This can be essential the first time I try to understand the topic.
Grok – I will say the responses from deep search of Grok lie between ChatGPT and Perplexity. Besides, Grok usually responses with more practical examples. I love the presentation of it, especially the tables it provides. This makes the results more organized.

Final Thoughts

It’s clear there’s no single champion after this comparison. Each tool demonstrated distinctive strengths:

Perplexity	excels with quantitative data and academic precision
ChatGPT	provides comprehensive context and background information
Grok	balances depth with practical examples and superior presentation

While Claude consistently rated Grok lowest in the comparison, I found its organization and practicality particularly valuable. The ideal approach may be using these tools in combination—Perplexity for specific data points, ChatGPT for broad understanding, and Grok for practical applications.

For researchers and knowledge workers in 2025, having access to all three provides the most complete research toolkit.