Skip to content

dcarpintero/athena

Repository files navigation

Open_inStreamlit Python License

πŸ¦‰ Athena - Research Companion

Athena is an AI-Assist protoype powered by Cohere-AI and Embed-v3 to faciliate scientific Research. Its key differentiating features include:

  • Advanced Semantic Search: Outperforms traditional keyword searches with state-of-the-art embeddings, offering a more nuanced and effective data retrieval experience that understands the complex nature of scientific queries.
  • Human-AI Collaboration: Enables easier review of research literature, highlighting key topics, and augmenting human understanding.
  • Admin Support: Provides assistance with tasks such as categorization of research articles, e-mail drafting, and tweets generation.

πŸ“š Overview

Data Pipeline

As part of this project we have created two datasets of 50.000 arXiv articles related to AI and NLP using Cohere Embedv3:

Steps:

  1. Retrieve Articles' Metadata from ArXiv. See ./data_pipeline/retrieve_arxiv.py
  2. Embed Articles' Title and Abstract using Embedv3. See ./data_pipeline/embed_arxiv.py
  3. Store Articles' Metadata and Embeddings in Weaviate. See ./data_pipeline/index_arxiv.py

Prompt Templates, Output Formatting, and Validation

Some of our tasks such as enriching abstracts with Wikipedia Links, crafting a glossary, composing e-mails and tweeting rely on a set of:

Those prompts are then composed into a LangChain chain as in the following code snippets:

Weaviate Schema

See ArxivArticle Class.

Cohere Engine

The coral.py class provides an abstraction layer over Cohere endpoints.

Streamlit App

See app.py

πŸš€ Quickstart

  1. Clone the repository:
git@github.com:dcarpintero/athena.git
  1. Create and Activate a Virtual Environment:
Windows:

py -m venv .venv
.venv\scripts\activate

macOS/Linux

python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Run Data Pipeline (optional)
python retrieve_arxiv.py
python embed_arxiv.py
python index_arxiv.py
  1. Launch Web Application
streamlit run ./app.py

πŸ”— References

Releases

No releases published

Languages