UM_Bristol

Back to HadCM3_technical_notes

Paul Valdes - Craig Butts (HoS, chemistry) got money (from EPSRC?) for BC/BP replacement. Cloud but local. Dynamically partitioned. But disk space is an issue.

David Manley - £1.2 million for x86 replacement. EPSRC money. Key people are Annela Seddon (APVC), Stacey Downton (leading on IT side). Speak to Stacey about BC4 replacement. BRICS - Bristol Research…. Also Simon Hanna who is the Academic Lead.

Simon Hanna “Digital Labs”. Virtual Machine. One Partition will be “BC5” Has a teaching component. Sadaf’s priority. Questions for Simon:

Paul Valdes and Gethin Williams and Sadaf and Simon - Options for HPC

Organisation Keith Woolley > Matt Shard > Steph Downton Woolley > Steve Edge… > Simon Spate > Duncan Baldwin > David Gardner Keith Woolley and Sadaf lead IT services and BriCS respectively. Previously was Sadaf and Chapman. Gethin currently doing extra role as Chapman left. ACRC will likely be split between BRiCS and IT services. Digital Labs will sit in BriCS (with Isambards). NextGen-AI will sit in IT services. RDSF will sit in IT services.

Gethin UK-wide - many groups struggling with UM install due to move from Centos7 to Rocky8. 1-day MOAP, 1-day Paul. 3 days ACRC. Saane - Polly - Stacey Currently Thursdays until 2025. 50 days of MOAP. Options for funding are extend MOAP, ask Neil Abel, Past2Future, RSE. Dan to speak to Stacey regarding RSE.

Pilot Digital Labs Has 10 nodes = 1280 cores (cf BC4=15,000). 200TB of solid-sate

Isambard 3 application

  1. How many CPU hours are you requesting? They ask how many CPU-hours that we will need for the project.  Looking at the spec, I am assuming that we will use one (half) node, so 72 cores, and probably the testing over 3 months will amount to a few hundred years of simulation (let’s say 1000 years) (if things go well and we get to the point of evaluating the model).  This might be 20 days of solid integration (unless the model is very slow).  So, 72 * 20 * 24 = 34,560, so ask for 50,000 CPU-hours?

  2. How many terabytes (TB) of disk space in total do you expect to need? For disk-space, I guess 200 GB will be enough, just for testing?

  3. What is your proposed project name? HadCM3-Isambard3

  4. Please give a brief description of your project (about 250-500 words). We have been running a climate model on BC4 for about 15 years or so. We have had great success with this, and it has been the primary tool of our research group which averages about 20 people or more in terms of funded postdocs and PhD students. Given the impending end of BC4, it is imperative that we make the transition to either Digital Labs, or NextGen/self-service-cloud, or to Isambard 3/AI, as soon as possible, to ensure continuity of service and research across multiple UKRI/EU/industry funded projects. However, this is non-trivial. The (fortran) code is exceptionally complex and large, and is surrounded by wrappers, in total of order 1 million lines of code. As such, we need considerable time to port the code. In this project, we would like to continue our work, carried out by Gethin Williams, to port the code to Isambard. This involves compiling, running, evaluating, and setting up the modelling infrastructure. We anticipate using just one node (actually half a node, so 72 cores if I have understood correctly from the Isambard 3 spec). Plus the head-node for compiling and testing wrapper scripts etc. I am guessing 20 days of continuous run time in total over the 3 month period, but this is a best(largest)-case scenario, and assumes that we get onto the evaluation phase of the project (benchmarking results to previous simulations), which is very much a “stretch goal”.

Stacey Downton

See Simon Hanna document.

Sadaf Met with Sadaf Alam and Gethin Williams and Richard Gilham. Sadaf said that NVIDIA will not really be interested in our code as it is too old and bespoke, so no support for installing on Isambard.

Ian Bond Thank you for your time on Tuesday - it was much appreciated.

In terms of the HPC part of our discussion, see below for my understanding of what we discussed - please let me know if any of this is incorrect.  If you are happy then I will disseminate this more widely in Geography.

Simon, Sadaf, India

As you may be aware, the University has given the green light for investing in a new HPC service at the university.

Once in place, the new environment will gradually replace first BlueCrystal 4 and then BluePebble. Launched in 2017 and 2020 respectively, our current systems are aging rapidly - and no longer meet energy-efficiency standards.

Contemporary x86 HPC CPUs and planned upgrade programme   

The University’s new HPC service will offer significant benefits to users compared with BlueCrystal 4 and BluePebble, including faster processors and a planned upgrade programme allowing the environment to grow with demand and remain up-to-date.  

The CPU based system will be focused on the x86 architecture but will adapt to users requirements as technologies change. It is also a sustainable solution, both in terms of regular hardware investments to maintain and grow the system, and in terms of the huge energy and financial saving involved in switching away from BlueCrystal 4 and BluePebble.

Once in place, the new HPC system will sit alongside the University allocations of Isambard 3 and Isambard-AI, to offer a comprehensive range of HPC solutions to staff and students.

HPC Service Continuity

The University’s Digital Research Infrastructure Board and IT Services, appreciate that any programme of changes to your HPC service will be a cause for concern.

Over a thousand research projects and teaching courses throughout the University depend on high performance computing.

We are committed to continuing to provide a university wide HPC service that meets your research and teaching needs throughout the transition to a new HPC service.

Timelines

We are currently developing the timelines for the transition to the new HPC system, and we will adjust these where needed to ensure there are no interruptions to the continuity of your HPC services.   

Transition to a modern x86 HPC facility

Once the new HPC system is built and ready for use, we will gradually transition users from BlueCrystal 4 and BluePebble onto the new HPC facility. This will be an incremental process to ensure the continuity of your HPC services at the University.

Keeping you informed

We appreciate that the continuity of HPC services is vital for your research and teaching, and we will send you monthly updates on the HPC transition project and timelines as these are developed and refined.

You can also get in touch with your school representative on the HPC and RDSF Executive Committee who may be able to answer your questions.  The committee Chair is Simon Hanna, School of Physics.

We are setting up a direct way for you to communicate with the HPC Transition Team, so you can ask questions or raise concerns. In our next update, we will let you know how you can get in touch with the team directly.  

Simon Hanna, Chair of the High Performance Computing Executive

Sadaf Alam, Director of Advanced Computing - Strategy and Academia

India Davison, Senior Project Manager for the HPC Transition Project

HPC User Forum, 24/2/2025

Sanne Terry, Simon Hanna There was a HPC user group/meeting - one person per School Volunteers to chair User Group meetings Phase 1 pilot system Aim for Phase 2 in July First project meeting this afternoon - 2x sys admin, Sadaf, Gethin, Simon (representing users) Will be 0.5 to 0.7 of BC4.

HPC User Forum, March 2025

Chris Woodgate, Sanne Terry Now called BC5. BriCS = Isambard3 + Isambard AI BC5 = x86, CPU>GPU, hosted as part of Digital Labs April = procurement June+August = building August = migrating users https://uob.sharepoint.com/sites/hpc HPC transition project Firm commitment to continuity of service “Costing your research” document

DRI User Forum, August 2025

Helen Jones - Project Manager, BC5. Physics tank room - racked and cabled. Early user testing, September. 304 32-core nodes. 4GB per core. 1.2 PBytes storage.

Simon Hanna. Allocation process. full utilization, equitable, compliant, accountable. easy to use, and flexible. Projects - self-contained research (or teaching). Apply for X node hours. Extensions allowed. How account for grants? Grants will get higher priority on queue Maybe will be expanded if there is a lot of grant income. Some BC5 expansion will replace BP. ACRC = BC4, BP, RDSF BriCS = BC5, Isambards

Gethin 11/9/2025

Isambard 3 completed 300 years.
Data on /work/um/tfcub/data_300_i3_pd/ pd files.

Gethin 26/11/2025

downloads on bc4 used sshfs BRICS control IB3 ARM chips don’t work with ifc must use gnu. IB3 and BC5 use “clifton” Puts a certificate on your ssh-key. Certificate is valid for 12 hours. Signing in involves user interaction.

Need to push for more diskspace on BC5/IB3.

BC4 used intel compiler, slurm - needs pmi2 (process manager interface) BC5 says supports pmi2, but we get an error.
Alternative - BC5 may work with gnu.

Geth will re-run the IB3 benchmark, and put on silurian. For now, he will work on BC5.

Geth is currently 1 day per week on LFRIC. Comes to an end soon. Geth will contact his line manager to see if he is OK with doing 2 days per week, TONIC-funded.

Gethin 19/2/2026

Mods for gnu fortran
Run job and process with new mods
clustersubmit -r bc5/ib3
develop manual download hack
ftp script auto-stop after x hours

Need to email Geth about getting a BC5 account so I can test things.