PROJECT Apollo | A Brief Overview

Objective:

Project Apollo aims to understand how national scientific enterprises influence global power dynamics by mapping the evolving influence of countries on research topics across various academic fields. This is achieved by leveraging Natural Language Processing (NLP), Topic Models, Network Analysis and other advanced statistical and Machine Learning techniques. The project emphasizes quantifying international influence, topic prevalence, thematic shifts and temporal trends in research output over time. It integrates dynamic models, clustering algorithms, and centrality measures to derive actionable insights into research dynamics.

Key Components:

  • Dynamic Topic Modeling (DTM):

    Purpose: Capture the evolution of research topics over time using a sliding window approach, spanning five-year intervals (1990–1994 to 2016– 2020).

    Methodology:

    BERTopic for Topic Extraction:

    • Utilized BERTopic, a transformer-based topic modeling framework, to extract topics dynamically.

    • BERTopic leverages HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) for clustering, ensuring robust identification of topic groups across our vast and complex dataset.

    Term Alignment Across Time Slices:

    • Aligned terms between adjacent time intervals to maintain consistency in topic analysis and enable cross-temporal comparisons.

    Cosine Similarity for Topic Evolution:

    • Calculated cosine similarity between topic-term distributions to measure "self-similarity" (consistency of a topic over time) and inter-country influence metrics (how countries shape or are shaped by these topics).

    Outputs:

    • Term Distributions: Generated for each time slice, providing insights into the prominence and evolution of topics.

    • Topic Trends: Highlighted how certain topics emerge, evolve, or fade over time, offering a detailed understanding of research dynamics.

  • Structural Topic Models (STMs):

    • Purpose: Examine the thematic structure of topics and their statistical significance across different time periods.

    • Outputs:

      • Topic-specific quality and quantity analysis, incorporating adjusted citation proportions.

      • Entropy measures to classify countries as “generalists“ or “specialists“ in terms of their research output - with respect to the Country Covariates’ Beta Coefficients and Citations received

      • Entropy measures to also classify topics as "popular" or "niche," reflecting topic diversity.

  • Influence Metrics:

    • Cosine Similarity: Used to compare topic distributions across time and evaluate inter-country influence on specific topics.

    • Influence Ratios: Metrics that measure the extent to which a country's research topics align with past thematic trends. If a country's research aligns more closely with its own past, it indicates that the country is influential in shaping the topic over time. Conversely, if the research aligns more with the field's overall trends (as captured by the Dynamic Topic Model (DTM)), it suggests that the country is influenced by the evolving topic and plays a less significant role in driving its direction.

    • Herfindahl-Hirschman Index (HHI): Applied to measure concentration and diversity in topic prevalence and weighted influence.

  • Data and Tools:

    • Research Abstracts along with their metadata were extracted from CSV files representing country-wise research outputs.

    • Node2Vec embeddings were generated to visualize research collaborations and similarity across institutions, coupled with clustering analyses for structural insights.

  • Network Analysis:

    • Centrality Measures: Degree, closeness, betweenness, eigenvector centrality, and PageRank were used to assess the significance of nodes (countries) within the citation network.

    • Connectivity: Examined the graph's weak and strong connectedness, clustering coefficients, and degree distribution.

Applications:

  • National Defense:

    • Provides the U.S. Army Cyber Institute with insights into the vulnerabilities and dependencies created by global scientific networks.

    • Equips policymakers with strategies to strengthen alliances and counter foreign influence.

  • Geopolitical Strategy:

    • Enables informed decision-making to anticipate the impacts of emerging scientific trends and align national policies accordingly.

  • Policy Development:

    • Facilitates the creation of frameworks to sustain U.S. leadership in science and technology while mitigating the risks posed by external scientific interventions.

Team and Collaboration:

Led by Dr. Charles J. Gomez at the University of Arizona, the project brings together experts in Data Science, Computational Social Science and Cybersecurity. Collaboration with the U.S. Army Cyber Institute ensures the practical applicability of research findings to defense and policy strategies.

Project Apollo stands as a cornerstone initiative to decode the complex interplay of science, policy, and geopolitics, providing a robust framework to navigate the evolving global scientific landscape.

I have created a GitHub Link that includes various projects and resources related to my work - originally stored on our HPC xdisk Storage. You can find code samples, documentation, and tools that may be helpful. Please feel free to explore the repository, and don't hesitate to reach out if you have any questions or need further information. https://github.com/Harshvardhan-Singh1/Project-Apollo

Next
Next

Analyzing Country Influence in Research Fields Using DTM (with BERTopic) and Labeled LDA