STOCK TITAN

GSI Technology Reports 3-Second Time-to-First-Token for Edge Multimodal LLM Inference on Gemini-II

Rhea-AI Impact
(High)
Rhea-AI Sentiment
(Neutral)
Tags

GSI Technology (NASDAQ: GSIT) reported preliminary benchmarks for its Gemini-II compute-in-memory processor, showing a 3-second time-to-first-token (TTFT) for multimodal 12B models on the edge while consuming ~30 watts at the AI sub-system. Independent tests cited competing platforms with ~12s TTFT at 30W (Qualcomm) and 3s at >100W (NVIDIA), positioning Gemini-II as a lower-power, low-latency option for power- and thermally-constrained edge applications such as drones and smart-city systems. Results are preliminary and intended to support ongoing evaluation, not to guarantee future commercial outcomes.

Loading...
Loading translation...

Positive

  • 3-second TTFT for multimodal 12B model at the edge
  • AI sub-system power of approximately 30 watts during benchmark
  • Performance comparable to competitors with substantially lower power usage

Negative

  • Results are preliminary and not a guarantee of commercial outcomes
  • Third-party comparisons cite similar TTFT only at much higher power (NVIDIA >100W)

News Market Reaction

-3.95%
27 alerts
-3.95% News Effect
+12.5% Peak Tracked
-7.1% Trough Tracked
-$12M Valuation Impact
$297M Market Cap
1.1x Rel. Volume

On the day this news was published, GSIT declined 3.95%, reflecting a moderate negative market reaction. Argus tracked a peak move of +12.5% during that session. Argus tracked a trough of -7.1% from its starting point during tracking. Our momentum scanner triggered 27 alerts that day, indicating elevated trading interest and price volatility. This price movement removed approximately $12M from the company's valuation, bringing the market cap to $297M at that time.

Data tracked by StockTitan Argus on the day of publication.

Key Figures

Time-to-first-token: 3 seconds AI subsystem power: 30 watts Model size: 12B parameters +3 more
6 metrics
Time-to-first-token 3 seconds Gemma-3 12B multimodal model on Gemini-II at the edge
AI subsystem power 30 watts Power draw for 3-second TTFT on Gemini-II including chip
Model size 12B parameters Gemma-3 vision-language model used in benchmark
Snapdragon X Elite TTFT 12 seconds Third-party TTFT at 30W on competitive embedded platform
Jetson Thor TTFT 3 seconds Third-party TTFT with over 100W power on NVIDIA Jetson Thor
Jetson Thor power over 100W Power for 3-second TTFT on competitive embedded platform

Market Reality Check

Price: $7.21 Vol: Volume 992,105 is below 2...
low vol
$7.21 Last Close
Volume Volume 992,105 is below 20-day average 1,760,410 (relative 0.56x). low
Technical Price $7.85 trades above 200-day MA at $5.05, showing strength before this news.

Peers on Argus

GSIT was up 4.25% while close peers MX, QUIK, ICG, and GCTS showed negative or f...
1 Up

GSIT was up 4.25% while close peers MX, QUIK, ICG, and GCTS showed negative or flat moves, indicating this edge‑LLM benchmark update was stock-specific rather than a broad semiconductor move.

Common Catalyst Only one peer (GCTS) had same-day AI/5G-related news, so no clear sector-wide catalyst emerged.

Historical Context

5 past events · Latest: Jan 15 (Neutral)
Pattern 5 events
Date Event Sentiment Move Catalyst
Jan 15 Earnings date notice Neutral +4.0% Announcement of timing for fiscal Q3 2026 results and conference call.
Jan 14 AI POC announcement Positive -3.5% Government-funded Gemini-II proof-of-concept for autonomous security system.
Dec 18 Conference participation Neutral +6.4% Participation in the 28th Annual Needham Growth Conference with investor meetings.
Nov 06 Edge strategy update Positive -6.6% Outlined edge-first Gemini-II strategy targeting drone and defense markets.
Nov 06 Edge strategy details Positive -6.6% Detailed claims on Gemini-II performance, energy savings, and market projections.
Pattern Detected

GSIT has seen negative reactions to strategic AI/edge announcements but positive moves around events and logistics updates, suggesting mixed sentiment on commercialization versus visibility.

Recent Company History

Over the past few months, GSIT has highlighted its Gemini-II edge AI strategy and financing. On Nov 6, 2025, it detailed an edge-first strategy for the drone market, tied to a $50 million equity raise, but the stock fell around related announcements. A government-funded edge AI proof-of-concept on Jan 14, 2026 also saw a negative reaction. In contrast, scheduling earnings on Jan 15, 2026 and participating in the Needham conference on Dec 18, 2025 coincided with positive moves, underscoring mixed reactions to technical versus visibility news.

Market Pulse Summary

This announcement underscores Gemini-II’s edge AI focus, reporting 3-second time-to-first-token at a...
Analysis

This announcement underscores Gemini-II’s edge AI focus, reporting 3-second time-to-first-token at about 30 watts for a 12B vision-language model, versus 12 seconds at 30W and 3 seconds at over 100W on competitive platforms. Recent history includes strategy updates on Nov 6, 2025 and a government-backed proof-of-concept on Jan 14, 2026, alongside a $50 million registered direct offering. Investors may watch for repeatable customer wins, revenue traction in edge “physical AI” markets, and how future filings or insider activity evolve around such technical milestones.

Key Terms

time-to-first-token, multimodal, compute-in-memory, associative processing unit, +4 more
8 terms
time-to-first-token technical
"These results demonstrated 3-second time-to-first-token (“TTFT”) performance for multimodal"
Time-to-first-token is a performance metric that measures the delay between sending a request to an AI language model and receiving its very first piece of output. Investors care because it reflects how quickly a product or service powered by the model responds — like the lag between pressing a doorbell and hearing someone answer — which affects user experience, system capacity, operational cost and competitiveness in products that rely on fast, interactive AI.
multimodal technical
"performance for multimodal large language models operating at the edge with video and text"
Multimodal describes an approach, product, or system that uses two or more different types of inputs, methods, or channels — for example combining text, images and audio in a technology product, or blending drugs, devices and therapy in medical care. For investors, multimodal solutions can broaden market reach and competitive differentiation but also add development cost, operational complexity and regulatory hurdles; think of it like a hybrid car that offers more capabilities but requires more parts and oversight.
compute-in-memory technical
"Gemini-II Compute-in-Memory processor. These results demonstrated 3-second"
Compute-in-memory is a chip design approach that performs calculations directly inside the memory where data sits instead of constantly shuttling data back and forth between separate processor and memory units. For investors, it matters because this can greatly speed up certain workloads and reduce power use—think of doing math on sticky notes at your desk instead of walking across the room to a whiteboard—potentially improving the performance, cost and battery life of devices that run artificial intelligence or data-heavy applications.
associative processing unit technical
"the inventor of the Associative Processing Unit (APU), a paradigm shift in artificial"
A specialized chip or software module designed to find and link related pieces of data quickly by content rather than by fixed addresses, like a librarian who locates books by topic instead of shelf numbers. For investors, it matters because these units can speed up tasks such as pattern matching, search, and certain types of artificial intelligence while using less power, potentially improving product performance, lowering operating costs, and opening new market opportunities for companies that build or use them.
vision-language model technical
"Using the Gemma-3 12B vision-language model on GSI’s production Gemini-II processor"
A vision-language model is an artificial intelligence that links pictures and words, able to describe images, answer questions about visual content, or generate captions from photos—like a person who can both look at a scene and explain it in plain language. Investors care because these models enable new products and automation (reducing costs, creating services, or changing user experiences), so companies that build or adopt them can gain revenue opportunities or face competitive disruption.
embedded edge processor technical
"12B model running on an embedded edge processor. Independent third-party testing"
An embedded edge processor is a small, energy-efficient computer chip built into a device — like a sensor, camera, or industrial machine — that processes data locally instead of sending it to a remote server. For investors, this matters because local processing can lower operating costs, improve speed and privacy, and enable smart features that create competitive advantages and new revenue opportunities in markets such as IoT, automotive, and industrial automation (think of it as a mini-brain that lets devices act quickly on their own).
latency technical
"designed to reduce data movement, which is a primary contributor to latency and power"
Latency is the time delay between when information or an instruction is created and when it is received, processed, or acted on by a market system or data feed. For investors, that delay can alter the price you receive, cause missed trading opportunities, or increase execution risk — like sending a text to buy an item and the seller acting a few seconds later after the price has changed.
change in control regulatory
"vests 100% on August 15, 2026, and will fully vest immediately prior to, and contingent upon, a Change in Control"
A "change in control" occurs when the ownership or management of a company shifts significantly, such as through a merger, acquisition, or sale of a large part of its assets. This change can impact how the company is run and may influence its future direction. For investors, it matters because it can affect the company's stability, strategy, and value, often signaling potential changes in investment risk or opportunity.

AI-generated analysis. Not financial advice.

Benchmark Results Demonstrate Fast Multimodal Edge Inference with Up to ~300% Better Performance per Watt versus Competitive Solutions

SUNNYVALE, Calif., Jan. 29, 2026 (GLOBE NEWSWIRE) -- GSI Technology, Inc. (Nasdaq: GSIT), the inventor of the Associative Processing Unit (APU), a paradigm shift in artificial intelligence (AI) and high-performance compute processing, providing true compute-in-memory technology, today announced preliminary benchmark results for the Gemini-II Compute-in-Memory processor. These results demonstrated 3-second time-to-first-token (“TTFT”) performance for multimodal large language models operating at the edge with video and text inputs.

Using the Gemma-3 12B vision-language model on GSI’s production Gemini-II processor, GSI achieved the 3-second TTFT while consuming approximately 30 watts at the AI sub-system, including the chip. To GSI’s knowledge, this 3-second TTFT at approximately 30 watts at the AI sub-system is the lowest publicly reported result for a multimodal 12B model running on an embedded edge processor.

Independent third-party testing of the same workload on competitive embedded platforms reported TTFT measurements of roughly 12 seconds on Qualcomm Snapdragon X Elite with 30W power, and 3 seconds on NVIDIA Jetson Thor with over 100W power. With performance on par with or superior to competitive platforms at lower power usage levels, GSI concludes that Gemini-II offers a favorable responsiveness and power-efficiency profile for power- and thermally-constrained edge environments.

“These benchmark results highlight what compute-in-memory can enable for physical AI,” said Lee-Lean Shu, President and Chief Executive Officer of GSI Technology. “Edge deployments require fast response under tight power and thermal limits. A 3-second TTFT means the system can generate an initial response every three seconds, which is generally fast enough to be useful in video-based applications without missing meaningful events. Gemini-II’s ability to deliver low-latency multimodal inference at low power supports a broader range of real-time applications, from autonomous systems to intelligent machines operating outside the data center.”

GSI believes this performance profile is well-suited to “physical AI” markets, including drones, smart city, and other edge systems where workloads are episodic and constrained by battery life, thermal design, and form factor. Faster TTFT at lower chip power can enable more responsive systems, longer duty cycles, and lower total system cost.

Edge physical AI represents a growing segment of AI compute as workloads shift from cloud-assisted models to local inference to improve latency, reliability and operational efficiency. GSI’s proprietary compute-in-memory architecture is designed to reduce data movement, which is a primary contributor to latency and power consumption in conventional architectures.

GSI’s engineering team continues to work on further optimizing Gemini-II’s responsiveness while collaborating with customers and partners, including G2 Tech, on system integration and proof-of-concept activity. Benchmark results are intended to support ongoing evaluation and do not guarantee future commercial outcomes.

ABOUT GSI TECHNOLOGY
GSI Technology is at the forefront of the AI revolution with our groundbreaking APU technology, designed for unparalleled efficiency in billion-item database searches and high-performance computing. GSI’s innovations, Gemini-I® and Gemini-II®, offer scalable, low-power, high-capacity computing solutions that redefine edge computing capabilities. GSI Technology is headquartered in Sunnyvale, California, and has sales offices in the Americas, Europe, and Asia. For more information, please visit www.gsitechnology.com.

Forward-Looking Statements

The statements contained in this press release that are not purely historical are forward-looking statements within the meaning of Section 21E of the Securities Exchange Act of 1934, as amended, including statements regarding GSI Technology’s expectations, beliefs, intentions, strategies, products, market opportunities and prospective customer engagements. All forward-looking statements included in this press release are based upon information available to GSI Technology as of the date hereof, and GSI Technology assumes no obligation to update any such forward-looking statements. Forward-looking statements involve a variety of risks and uncertainties, which could cause actual results to differ materially from those expected or implied.

GSI Technology’s participation in a proof-of-concept is exploratory in nature and may not result in any commercial contract, extended engagement, or recurring revenue. There can be no assurance that the scope, performance, or findings of any proof-of-concept will meet customer expectations or commercial requirements, or that such activities will lead to further business opportunities, order volume, or deploy-at-scale implementations. Additional risks and uncertainties that could cause actual results to differ materially from those expected or implied include, among others: the preliminary and limited nature of benchmark results; differences in workloads, configurations, measurement boundaries, and methodologies that can materially affect TTFT and power measurements; variability in model architectures, versions and toolchains that may impact performance; the pace and extent of adoption of “physical AI” at the edge and the impact of safety, privacy, and security requirements; supply-chain constraints affecting semiconductors, components, or manufacturing partners; GSI Technology’s historical dependence on sales to a limited number of customers and fluctuations in the mix of customers and products in any period; global public health crises that reduce economic activity; the rapidly evolving markets for its products and uncertainty regarding the development of these markets; the need to develop and introduce new products to offset the historical decline in the average unit selling price of its products; intensive competition; the continued availability of government funding opportunities; delays or unanticipated costs that may be encountered in the development of new products based on its in-place associative computing technology and the establishment of new markets and customer and partner relationships for the sale of such products; and delays or unexpected challenges related to the establishment of customer relationships and orders for its radiation-hardened and tolerant SRAM products. Many of these risks are currently amplified by and will continue to be amplified by, or in the future may be amplified by, economic and geopolitical conditions, such as changing interest rates, worldwide inflationary pressures, policy unpredictability, the imposition of tariffs, export controls and other trade barriers, military conflicts, particularly in relation to Taiwan, and a challenging global economic environment. These risks are discussed in more detail in GSI Technology’s most recently-filed Annual Report on Form 10-K, its Quarterly Reports on Form 10-Q and its other reports filed from time to time with the SEC. You are urged to review carefully and consider GSI Technology’s various disclosures in this press release and in its reports publicly disclosed or filed with the SEC that attempt to advise you of the risks and factors that may affect its business.

Source: GSI Technology, Inc.

Contacts:
Investor Relations
Hayden IR
Kim Rogers
541-904-5075
Kim@HaydenIR.com

Media Relations
Finn Partners for GSI Technology
Ricca Silverio
415-348-2724
gsi@finnpartners.com

Company
GSI Technology, Inc.
Douglas M. Schirle
Chief Financial Officer
408-331-9802


FAQ

What does the 3-second TTFT claim by GSI (GSIT) mean for edge multimodal LLMs?

It means the system can generate an initial token in three seconds, enabling timely responses for video-based tasks. According to the company, the benchmark used the Gemma-3 12B model on Gemini-II running at about 30W at the AI sub-system, indicating lower power for similar responsiveness.

How does Gemini-II power efficiency compare to Qualcomm Snapdragon X Elite and NVIDIA Jetson Thor for GSIT benchmarks?

Gemini-II reported similar or better responsiveness at about 30W versus competitors at higher power profiles. According to the company, Qualcomm showed ~12s TTFT at 30W and NVIDIA reached 3s TTFT but at over 100W, highlighting Gemini-II's power-efficiency tradeoff.

Which model and workload were used for GSI's Gemini-II benchmark announced Jan 29, 2026?

GSI used the Gemma-3 12B vision-language multimodal model for the benchmark. According to the company, the test included video and text inputs and measured a 3-second time-to-first-token on a production Gemini-II processor.

Is the Gemini-II 3-second TTFT result a commercial performance guarantee for GSIT customers?

No, the company described the benchmark as preliminary and for evaluation purposes, not a guarantee of commercial outcomes. According to the company, engineering and integration work with partners continues to optimize responsiveness and system integration.

What edge applications does GSI say Gemini-II is suited for after the Jan 29, 2026 benchmark?

GSI cites drones, smart-city systems, and other physical AI use cases constrained by battery and thermal limits. According to the company, lower TTFT at reduced chip power supports longer duty cycles and more responsive local inference on constrained devices.
Gsi Technology

NASDAQ:GSIT

GSIT Rankings

GSIT Latest News

GSIT Latest SEC Filings

GSIT Stock Data

265.72M
30.77M
15.33%
15.32%
0.81%
Semiconductors
Semiconductors & Related Devices
Link
United States
SUNNYVALE