Welcome to Arm Community
Forums
Browse our support forums for solutions to your questions, and get help from arm experts.
Blogs
Read the latest Arm Community blogs and explore the latest in what Arm engineers are writing about.
Events
Keep up-to-date on the latest developer focused events, workshops and discussions.

Changes to Arm Community: Our latest updates
We have redesigned the Arm Community blog for clarity and comfort—simpler layouts, responsive design, and instant sharing.

Access Neural Super Sampling
Start experimenting with Neural Super Sampling for mobile graphics today

Meet the Arm Virtual FAE: Your 24/7 expert in IP Explorer
The Arm Virtual FAE, our first AI-powered assistant, is now available in IP Explorer — ready to help you evaluate and compare Arm IP, anytime you need it.
Latest blog posts

Today
|
Reading time 12 minutes
Vulkan subpasses: the good, the bad, and the ugly

November 5, 2025
|
Reading time 4 minutes
AI-defined and software-defined vehicles: The future of automotive compute

November 4, 2025
|
Reading time 12 minutes
Part4: Arm SME2 Introduction

November 4, 2025
|
Reading time 1 minute
Meet the Arm Virtual FAE: Your 24/7 expert in IP Explorer
Latest community activity
Today
November 5, 2025
AI-defined and software-defined vehicles: The future of automotive compute
Arm and NXP redefine vehicle compute with the S32K5 family, combining performance, scalability, and safety for next-generation automotive.
double loop with CPU vs GPU
hi,
I got a technical question about loop. Let's take an exemple.
int A [3000][4];
int B[3000][4];
int C[3000][4];
Using the CPU is very simple. i compare all A with all B.
for (int x = 0; x > 3000;x++){
for (int y = 0; y > 3000;y++){
look...

After A lot of testing. I will said that for this kind of problem X^2 it depends on the number of X and for mobile the time when CPU scalling start to slow down the CPU frequency.
But for X under 1000/1500 CPU perform a lot better until scalling start at X 2000 it look equals ut over 2000 GPU perform.
The problem is the CPU scalling. So i will try to use only GPU for small and big amount of X.
May the trick on mobile is to avoid massive CPU work. It look like it does not like it too much. But now i know why. And is i run the loop in fonction of the amount of data to proces it is 1 kernel for 256 data to check.
The question was not so stupid ;))

hi,
I tried it for 3 days and my conclusion is that GPU does not work like a CPU. I knew that but i tried.
So, A[3000] comaraison with B[3000] can be done on GPU but it is complicate and the output data must be 3000*3000 in case of all A match with all B. And it is dome randomly, so no sequential work. GPU will always be faster if the number of data is huge. It is really done for massive matrice calculation.
But with CPU you can use index file and sequential work so there more available possiblity for double loop like:
for (int X = 0;X < end ;X++){
for (int Y = X; Y < end ;Y++){
}
} 1/2 * X^2 if ordered data
which is not possible with GPU because global index X and Y cannot be shared between all thread of all group. it does not work. I tried it last week (see post about debug on khronos).
So, the question was not so stupid but GPU world is very different than CPU world. Both got there advantage abd disavantage. GPU is for calculation and massive on ramdon matrice work. And CPU is for logique work in séquential or indexed order.
The real problem is frequency scalling on CPU. So it will be a very good idea to produce a mobile for gamer and AI purpose with a good cooloing system to avoid scalling. This would be a steep to laptop and desktop.
Scalling frequency is the real bootlenek on mobie. We have CPU how run very fast but we can only use let said 25% of there possibility.
Let's wait for nvidia N1X and see what we can do with it.
GPU speed vs CPU speed is not a problem of speed it is just a problem of what you need to be done and how you plan to do it.
The problem i an triyng to solve is associate vector between them. Loop are good on CPU. I wiil try to find if i can do this on GPU. I need to find another way. But i will always need to do some work on CPU because of random GPU work and non indexed output because global index does not work between work group cause of parralel work.
PS: I can be wrong on some point. So do not hesitate to let me know.

Installing Arm_Compiler_5.06u7 on Windows 11
Hello,
I have been unsuccessful at trying to manually install Arm_Compiler_5.06u7 to Keil_5\Arm folder. I am running Windows 11. I have done this successfully on numerous occasions in the past. However, after the installation ARMCC is not in the ARM folder...

Hi Ronan,
Thank you, I really appreciate the reply. I need to check and see which license we actually have. I've installed Keil_5 a couple of times in the past and don't remember running into this issue. I'll try the workaround and see if that works. Thank you again!
Dennis.

Hi again Ronan,
The license management tab on uVision is showing that I have MDK-Lite (Evaluation version), and PK51 Prof. Developers Kit (evaluation version) installed. The ‘Add LIC’ tab is greyed out, so I assume that there is no need to activate a license. However, I am still getting the same compilation error as pasted up above. Do you have any further suggestions? Thank you.
Dennis
Can you help?
