Complexity Analysis

Time and space complexity of each algorithm in CiteGraph

VNumber of papers (82,937)

ENumber of citation edges (148,116)

KAverage citations per paper

Build Adjacency List

Parse all citation edges and populate the graph

O(E)

Each edge is visited exactly once. Two Maps are built: citedBy (in-edges) and cites (out-edges). Nodes are also initialised in O(V).

Top Cited (Max-Heap)

Rank all papers by citation count using a max-heap

O(V log V)

All V papers are pushed into a MaxPriorityQueue (binary heap). Each enqueue is O(log V). Dequeuing the top-K papers is O(K log V), dominated by the build phase.

Search by Title / ID

Linear scan across all papers for substring / exact match

O(V)

Iterates over the papers Map once. ID lookup short-circuits on first match. Title search collects all matches up to a limit of 50 and sorts by citation count.

Paper Cluster (Jaccard)

Find papers with overlapping citation sets

O(V · K)

For each of the V papers, the intersection with the target paper's citation set is computed in O(K) where K is the average citation list length. Jaccard similarity = |A∩B| / |A∪B|.

Data Structures · Space Complexity

O(V + E)

Adjacency List (Map)

Two Maps — citedBy and cites — store in- and out-edges per node.

O(V)

MaxPriorityQueue

Binary max-heap holding all papers keyed by citation count.

O(K)

Set (for Jaccard)

Temporary Set built from the target paper's citation list per query.

Overall Space

Graph + heap + working sets

O(V + E)