Welcome to Arm Community

96,923 members 43,294 conversations 2,859 blogs

Forums

Browse our support forums for solutions to your questions, and get help from arm experts.

Blogs

Read the latest Arm Community blogs and explore the latest in what Arm engineers are writing about.

Events

Keep up-to-date on the latest developer focused events, workshops and discussions.

Changes to Arm Community: Our latest updates

We have redesigned the Arm Community blog for clarity and comfort—simpler layouts, responsive design, and instant sharing.

Post image for NGP launch blog post

Access Neural Super Sampling

Start experimenting with Neural Super Sampling for mobile graphics today

Meet the Arm Virtual FAE: Your 24/7 expert in IP Explorer

The Arm Virtual FAE, our first AI-powered assistant, is now available in IP Explorer — ready to help you evaluate and compare Arm IP, anytime you need it.

Latest blog posts

Today

|

Reading time 12 minutes

Vulkan subpasses: the good, the bad, and the ugly

Peter Harris

November 5, 2025

|

Reading time 4 minutes

AI-defined and software-defined vehicles: The future of automotive compute

Prakash Mohapatra

November 4, 2025

|

Reading time 12 minutes

Part4: Arm SME2 Introduction

Zenon (Zhilong) Xiu

November 4, 2025

|

Reading time 1 minute

Meet the Arm Virtual FAE: Your 24/7 expert in IP Explorer

Matt Rowley

Latest community activity

Today



November 5, 2025



replied to this:
hterrolle's profile picture
hterrolleasked a question in6 days ago

double loop with CPU vs GPU

hi,

I got a technical question about loop. Let's take an exemple.

int A [3000][4];

int B[3000][4];

int C[3000][4];

Using the CPU is very simple. i compare all A with all B.

for (int x = 0; x > 3000;x++){

    for (int y = 0; y > 3000;y++){

          look...

Read more


hterrolle's profile picture
hterrollein reply to hterrolle5 days ago

After A lot of testing. I will said that for this kind of problem X^2 it depends on the number of X and for mobile the time when CPU scalling start to slow down the CPU frequency.

But for X under 1000/1500 CPU perform a lot better until scalling start  at X 2000 it look equals ut over 2000 GPU perform.

The problem is the CPU scalling. So i will try to use only GPU for small and big amount of X.

May the trick on mobile is to avoid massive CPU work. It look like it does not like it too much. But now i know why. And is i run the loop in fonction of the amount of data to proces it is 1 kernel for 256 data to check.

The question was not so stupid ;))


hterrolle's profile picture
hterrollein reply to hterrolle1 day ago

hi,

I tried it for 3 days and my conclusion is that GPU does not work like a CPU. I knew that but i tried.

So, A[3000] comaraison with B[3000] can be done on GPU but it is complicate and the output data must be 3000*3000 in case of all A match with all B. And it is dome randomly, so no sequential work. GPU will always be faster if the number of data is huge. It is really done for massive matrice calculation.

But with CPU you can use index file and sequential work so there more available possiblity for double loop like:

for (int X = 0;X < end ;X++){

    for (int Y = X; Y < end ;Y++){

    }

}   1/2 * X^2 if ordered data

which is not possible with GPU because global index X and Y cannot be shared between all thread of all group. it does not work. I tried it last week (see post about debug on khronos).

So, the question was not so stupid but GPU world is very different than CPU world. Both got there advantage abd disavantage. GPU is for calculation and massive on ramdon matrice work. And CPU is for logique work in séquential or indexed order.

The real problem is frequency scalling on CPU. So it will be a very good idea to produce a mobile for gamer and AI purpose with a good cooloing system to avoid scalling. This would be a steep to laptop and desktop.

Scalling frequency is the real bootlenek on mobie. We have CPU how run very fast but we can only use let said 25% of there possibility.

Let's wait for nvidia N1X and see what we can do with it.

GPU speed vs CPU speed is not a problem of speed it is just a problem of what you need to be done and how you plan to do it.

The problem i an triyng to solve is associate vector between them. Loop are good on CPU. I wiil try to find if i can do this on GPU. I need to find another way. But i will always need to do some work on CPU because of random GPU work and non indexed output because global index does not work between work group cause of parralel work.

PS: I can be wrong on some point. So do not hesitate to let me know.


View more replies

replied to this:
Dennis Quick's profile picture
Dennis Quickasked a question in3 days ago

Installing Arm_Compiler_5.06u7 on Windows 11

Hello,

I have been unsuccessful at trying to manually install Arm_Compiler_5.06u7 to Keil_5\Arm folder. I am running Windows 11. I have done this successfully on numerous occasions in the past. However, after the installation ARMCC is not in the ARM folder...

Read more


Dennis Quick's profile picture
Dennis Quickin reply to Ronan Synnott1 day ago

Hi Ronan,

Thank you, I really appreciate the reply. I need to check and see which license we actually have. I've installed Keil_5 a couple of times in the past and don't remember running into this issue. I'll try the workaround and see if that works. Thank you again!   

Dennis. 


Dennis Quick's profile picture
Dennis Quickin reply to Dennis Quick1 day ago

Hi again Ronan,

The license management tab on uVision is showing that I have MDK-Lite (Evaluation version), and PK51 Prof. Developers Kit (evaluation version) installed. The ‘Add LIC’ tab is greyed out, so I assume that there is no need to activate a license. However, I am still getting the same compilation error as pasted up above. Do you have any further suggestions? Thank you.

Dennis


View more replies

View more
placeholder