Battle of the Titans: ChatGPT, Perplexity, and Grok — Which AI Deep Research Assistant Reigns Supreme in 2025?
What is Deep Research?
ChatGPT has introduced Deep Research functionality recently. In fact, it is not the first one in the industry. GenAI tools like Gemini, Grok and Perplexity also provide such a function. Deep Research is an agent that employs reasoning to process huge amounts of online data and perform complex, multi-step research tasks. You may want to know how much you have to pay for these services. Let me show you here first.
| AI Tool | Subscription Tier | Monthly Price | Annual Price | Deep Research Queries |
|---|---|---|---|---|
| Gemini | Google One AI Premium | $19.99 | $199.90 | Unlimited |
| Grok | Super Grok | $30 | $300 | Unlimited |
| Perplexity | Pro | $20 | $200 | 5 per day |
| ChatGPT | Plus | $20 | $240 | 10 per month |
| ChatGPT | Team | $25 per user | $300 per user | 10 per month |
| ChatGPT | Pro | $200 | $2,400 | 120 per month |
Of course, my aim is not to show you the fees for using the services in this post. In fact, I tested and compared the deep research functions of ChatGPT, Grok, and Perplexity. I used another tool, Claude (Sonnet 3.7) to help to do the comparison. I will also share my own comments of these three tools at the end of this post.
How Did I Do the Comparison among ChatGPT, Grok and Perplexity?
I used ChatGPT, Grok and Perlexity to perform deep research on three topics:
- Intermittent fasting
- The use of AI in architectural design
- Thermal comfort of street canyons
These three topics were chosen on purpose. The first one is a more general topic related to our everyday lives (and my life, I need to lose weight =p). The second one is more about a professional field (construction industry). The third topic is a problem being investigated in the academia. For the sake of testing, I do not add any details to the questions. I just asked ChatGPT, Grok and Perplexity to give me a comprehensive summary of each of the topics.
Comparison Framework
It is also important to have a framework for doing the comparison. I asked Claude to adopt the following criteria when performing the comparison.
- Depth of analysis – How thoroughly does the model explore the topic beyond surface-level information?
- Factual accuracy – Is the information provided correct and up-to-date based on the model’s knowledge cutoff?
- Source diversity – Does the model draw from varied perspectives and types of sources?
- Logical reasoning – How well does the model connect ideas and develop coherent arguments?
- Knowledge gaps acknowledgment – Does the model appropriately recognize when it lacks information?
- Contextualization – How effectively does the model place information within broader relevant contexts?
- Bias identification – Does the model recognize potential biases in its analysis?
- Synthesis capability – How well does the model integrate multiple concepts into a cohesive analysis?
- Nuance recognition – Does the model acknowledge complexities and avoid oversimplification?
- Practical utility – Is the response from the model practical?
I also asked Claude to analyse the strength and limitation of the responses, as well as giving a rating (from 1-10) of each response from the deep research GenAI tools.
So, what are the results?
Here are the results of the comparison, by Claude. I also put the deep research responses here so that you can perform further comparison.
Analysis of Three Deep Research Functions on Intermittent Fasting
Perplexity (PERP) - Rating: 8.5/10
Strengths:
- Exceptional depth of analysis with a structured academic approach
- Strong scientific rigor with detailed explanations of physiological mechanisms
- Excellent citation of specific research findings with numeric data (e.g., “reduced by 13% (95% CI: 5-21%)”)
- Clear acknowledgment of controversies (e.g., 2024 cardiovascular mortality study)
- Sophisticated synthesis of multiple research streams
- Academic writing style with subject headings and organized sections
Limitations:
- Some sections are highly technical, potentially limiting accessibility
- Occasionally dense with medical terminology that might overwhelm non-specialist readers
- Citation style makes it harder to distinguish which statements come from which source
Grok - Rating: 6.5/10
Strengths:
- Very accessible presentation with clear “Key Points” summary upfront
- Straightforward language appropriate for general audiences
- Good balance of benefits and risks
- Practical considerations for different populations
- Useful table summarizing key aspects
- Transparent about limitations of research evidence
Limitations:
- Less detailed on physiological mechanisms and scientific explanations
- Fewer specific research findings and statistical data
- Less comprehensive coverage of biological processes like autophagy
- Limited exploration of nuanced considerations for special populations
- Less rigorous in differentiating levels of evidence quality
ChatGPT - Rating: 7.5/10
Strengths:
- Comprehensive coverage with excellent practical implementation advice
- Clear organization with logical flow between topics
- Strong emphasis on practical considerations and how to implement IF safely
- Good balance between scientific explanations and actionable information
- Thorough exploration of different IF methods with pros and cons
- Thoughtful discussion of how IF compares to other dietary approaches
Limitations:
- Citation format makes source verification difficult
- Sometimes repetitive across sections
- Less precision in presenting specific research measurements and statistics
- Some scientific assertions lack detailed supporting evidence
- More anecdotal elements compared to Perplexity’s research focus
Overall Comparative Assessment:
- Depth of analysis: Perplexity > ChatGPT > Grok
Perplexity provided the most thorough scientific examination, while Grok offered a more surface-level overview. - Factual accuracy: All three appeared generally accurate, though Perplexity offered the most precise data points.
- Source diversity: Perplexity > ChatGPT > Grok
Perplexity demonstrated the broadest range of research sources. - Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s analysis showed sophisticated connections between concepts. - Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT was particularly transparent about what remains unknown. - Contextualization: ChatGPT > Perplexity > Grok
ChatGPT effectively placed IF within broader lifestyle considerations. - Bias identification: Perplexity > ChatGPT > Grok
Perplexity most clearly identified potential biases in studies. - Synthesis capability: Perplexity > ChatGPT > Grok
Perplexity demonstrated superior integration of multiple scientific concepts.
Each research function has different strengths that might suit different use cases: Perplexity for academic-level analysis, Grok for quick accessible overviews, and ChatGPT for balanced information with practical guidance.
Analysis of Three Deep Research Functions on Urban Street Canyon Thermal Comfort
Perplexity (PERP) - Rating: 9/10
Strengths:
- Exceptional academic rigor with precise technical language
- Comprehensive organization with logical section progression
- Excellent quantitative data (e.g., “increasing H/W from 0.9 to 1.5 reduced PET by up to 8°C in summer”)
- Sophisticated integration of multiple factors affecting thermal comfort
- Detailed regional case studies across different climate types
- Strong evidence-based design recommendations with specific thresholds (e.g., “H/W ≥ 1.5 in hot climates”)
- Clear citations linked to specific claims
Limitations:
- Technical density may reduce accessibility for non-specialists
- Some specialized terminology (e.g., “3DGI”) introduced without full explanation
- Could benefit from more discussion of implementation challenges
Grok - Rating: 7/10
Strengths:
- Accessible presentation with clear “Key Points” summary upfront
- Balanced overview of major factors influencing street canyon comfort
- Effective use of tables to summarize findings on aspect ratios and materials
- Inclusion of interesting “Unexpected Detail” on air quality connections
- Good organization with clear section headings
- Plain language explanations of technical concepts
Limitations:
- Less quantitative data compared to Perplexity’s analysis
- Limited discussion of thermal comfort indices (PET, UTCI)
- Fewer specific design recommendations with measurable thresholds
- Weaker explanation of underlying physical mechanisms
- Less comprehensive discussion of seasonal variations
- Citations presented without clear connection to specific claims
ChatGPT - Rating: 8.5/10
Strengths:
- Extremely comprehensive coverage of all relevant aspects
- Excellent explanations of fundamental principles with appropriate context
- Thorough discussion of thermal comfort indices (PMV, PET, UTCI)
- Strong pedagogical approach that builds understanding progressively
- Balanced coverage of research methodologies and knowledge gaps
- Detailed analysis of vegetation benefits with multiple cooling mechanisms
- Well-articulated limitations and nuances (e.g., albedo paradox)
Limitations:
- Considerably longer than necessary, with some redundancy
- Citation format makes source attribution difficult to track
- Some paragraphs could be more concise without losing information
- Occasionally less precise with quantitative data compared to Perplexity
- Less structured design recommendations than Perplexity
Overall Comparative Assessment:
- Depth of analysis: Perplexity = ChatGPT > Grok
Both Perplexity and ChatGPT provided exceptionally thorough analyses, while Grok offered a more general overview. - Factual accuracy: All three appeared accurate, with Perplexity providing the most precise quantitative data.
- Source diversity: Perplexity > ChatGPT > Grok
Perplexity showed the most diverse range of research sources. - Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s reasoning flow most effectively connected different aspects of the topic. - Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT included a dedicated section on research gaps and limitations. - Contextualization: ChatGPT > Perplexity > Grok
ChatGPT excelled at placing concepts in broader context and explaining their significance. - Bias identification: ChatGPT > Perplexity > Grok
ChatGPT most effectively addressed conflicting findings and limitations of research. - Synthesis capability: Perplexity > ChatGPT > Grok
Perplexity most elegantly synthesized multiple complex factors. - Nuance recognition: ChatGPT > Perplexity > Grok
ChatGPT provided the most nuanced discussion of trade-offs and context-specific factors. - Practical utility: Perplexity > ChatGPT > Grok
Perplexity provided the most specific, actionable design recommendations.
Each model demonstrated different strengths: Perplexity excelled in academic rigor and precise recommendations, ChatGPT provided outstanding explanations and contextual understanding, while Grok offered accessibility and clarity for non-experts. The ideal approach might combine Perplexity’s precision, ChatGPT’s comprehensive explanations, and Grok’s accessibility.
Analysis of Three Deep Research Functions on AI in Architectural Design
Perplexity (PERP) - Rating: 8.5/10
Strengths:
- Exceptional organization with clear academic structure and logical flow
- Sophisticated coverage of AI applications in architecture with distinct categorization
- Strong integration of case studies with specific metrics (e.g., “reducing HVAC loads by up to 30%”)
- Excellent discussion of ethical considerations and professional implications
- Well-articulated future directions section with thoughtful insights
- Precise language that balances technical accuracy with accessibility
- Strong citation linking throughout the document
Limitations:
- Some sections (like urban planning) could benefit from more detailed examples
- Limited exploration of challenges or potential drawbacks of AI implementation
- Could have provided more comparison between different AI tools’ capabilities
Grok - Rating: 7/10
Strengths:
- Very accessible presentation with clear “Key Points” summary upfront
- Effective organization with distinct application categories
- Good use of specific examples and case studies (e.g., Shanghai Tower, Wembley Park)
- Inclusion of a useful tools table with descriptions and URLs
- Clear statistics on tool adoption (e.g., “ARCHITEChTURES used in 140+ countries by 15,000+ users”)
- Balanced coverage that acknowledges AI complements rather than replaces architects
Limitations:
- Less analytical depth compared to Perplexity and ChatGPT
- Limited explanation of underlying AI technologies and mechanisms
- Fewer specific metrics on performance improvements
- Less exploration of ethical considerations and future implications
- Citation format makes it difficult to trace specific claims to sources
ChatGPT - Rating: 9/10
Strengths:
- Comprehensive coverage with exceptional detail across all architectural applications
- Excellent explanations of how AI technologies function in each context
- Outstanding integration of real-world examples and case studies
- Strong section on emerging trends with thoughtful analysis of future directions
- Clear articulation of both benefits and limitations of AI in architecture
- Well-structured content that builds logically from design tools to implementation
- Balanced perspective on AI as augmentation rather than replacement for architects
Limitations:
- Significantly longer than necessary with some repetitive information
- Citation format makes it difficult to trace specific claims to sources
- Some sections (like building performance analysis) could be more concise
- Occasional overreliance on examples from the same sources
Overall Comparative Assessment:
- Depth of analysis: ChatGPT > Perplexity > Grok
ChatGPT provided the most thorough explanation of AI applications and their implications. - Factual accuracy: All three appeared accurate, with varying levels of detail.
- Source diversity: ChatGPT > Perplexity > Grok
ChatGPT demonstrated the broadest range of examples and case studies. - Logical reasoning: Perplexity > ChatGPT > Grok
Perplexity’s analysis showed particularly clear logical progression. - Knowledge gaps acknowledgment: ChatGPT > Perplexity > Grok
ChatGPT was most transparent about limitations of current AI applications. - Contextualization: ChatGPT > Perplexity > Grok
ChatGPT excelled at placing AI tools within the broader architectural workflow. - Bias identification: Perplexity > ChatGPT > Grok
Perplexity most clearly addressed algorithmic bias concerns. - Synthesis capability: ChatGPT > Perplexity > Grok
ChatGPT demonstrated superior integration of multiple AI applications into a cohesive narrative. - Nuance recognition: ChatGPT > Perplexity > Grok
ChatGPT provided the most nuanced discussion of AI’s role alongside human architects. - Practical utility: Grok > ChatGPT > Perplexity
Grok’s tools table and practical examples offered the most immediately actionable information.
Each model demonstrated distinct strengths: ChatGPT provided exceptional comprehensive coverage and detailed explanations, Perplexity offered strong academic structure with precise analysis, and Grok delivered accessible practical information. The ideal approach would combine ChatGPT’s comprehensive detail, Perplexity’s logical organization, and Grok’s accessible presentation with practical examples.
What do I think about the results?
Until now, I’ve just used AI to review AI’s. We may also want to take a look at the results ourselves. In fact, I think that all the tools provide good results. However, there are still differences among them. Claude also spots these differences.
- Perplexity – It tends to extract more numbers when I ask Perplexity to perform deep research. This will be more useful for folks who work in academia. I can get more concrete ideas about the results of different studies by having more numbers.
- ChatGPT – Usually, the response from ChatGPT is longer compared to the other two. Claude said that there could be redundancies. However, ChatGPT usually gives more information than I originally think of. It tends to give me more background information. ChatGPT is perfect if I want a high-level overview of a topic. This can be essential the first time I try to understand the topic.
- Grok – I will say the responses from deep search of Grok lie between ChatGPT and Perplexity. Besides, Grok usually responses with more practical examples. I love the presentation of it, especially the tables it provides. This makes the results more organized.
Final Thoughts
It’s clear there’s no single champion after this comparison. Each tool demonstrated distinctive strengths:
| Perplexity | excels with quantitative data and academic precision |
| ChatGPT | provides comprehensive context and background information |
| Grok | balances depth with practical examples and superior presentation |
While Claude consistently rated Grok lowest in the comparison, I found its organization and practicality particularly valuable. The ideal approach may be using these tools in combination—Perplexity for specific data points, ChatGPT for broad understanding, and Grok for practical applications.
For researchers and knowledge workers in 2025, having access to all three provides the most complete research toolkit.
Please share this article if you like it!

No Comment! Be the first one.