C# ZipArchive for large volume of files cause memory usage high problem

LAI TOCA
3 min readDec 23, 2022

--

We have a file API that routine scheduling a archive file(s) job of the specific path(drive) for backup the data. The application occupied memory usage above 3GB under the monitoring dashboard from the date: December 11.

After we dig into our source code, we might find out some potential problem for using ZipArchive method. Frist, we make sure that we don’t have issue to disposal our manage resource for streaming handling…etc. Second we have found out that ZipArhcive could have different mode for processing the zipping. Hence we just consult with the hottest AI platform: ChatGPT.

As the answer reply from the AI, the archive mode: update will loading the entire file(s) content into memory and that was the main reason why we facing the highly usage memory for zipping the huge amount volume of file(s).

So here below was the original code section that we could re-produce the memory usage under the date, December 11:

string SizeConverter(long bytes)
{
var fileSize = new decimal(bytes);
var kilobyte = new decimal(1024);
var megabyte = new decimal(1024 * 1024);
var gigabyte = new decimal(1024 * 1024 * 1024);

switch (fileSize)
{
case var _ when fileSize < kilobyte:
return $"Less then 1KB";
case var _ when fileSize < megabyte:
return $"{Math.Round(fileSize / kilobyte, 0, MidpointRounding.AwayFromZero):##,###.##}KB";
case var _ when fileSize < gigabyte:
return $"{Math.Round(fileSize / megabyte, 2, MidpointRounding.AwayFromZero):##,###.##}MB";
case var _ when fileSize >= gigabyte:
return $"{Math.Round(fileSize / gigabyte, 2, MidpointRounding.AwayFromZero):##,###.##}GB";
default:
return "n/a";
}
}

Process currentProcess = System.Diagnostics.Process.GetCurrentProcess();
"------Start Archive------------".Dump();
currentProcess.Refresh();
SizeConverter(currentProcess.WorkingSet64).Dump();

var T1 = new Thread(() =>
{
var root = @"D:\Temp\Arch\";
var archiveDate = $"archive-{DateTime.Now:yyyyMMdd_HHmmss}.zip";
using var zipToOpen = new FileStream(root + archiveDate, FileMode.OpenOrCreate);
using var archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update, false);

var files = new List<string>()
{
@"C:\Files\A_232MB.zip",
@"C:\Files\B_6MB.exe",
@"C:\Files\C_98MMB.exe",
@"C:\Files\D_3MB.pdf" ,
@"C:\Files\F_82MB.exe"
};

foreach (var relativePath in files)
{
var path = Path.Combine(root, relativePath);
archive.CreateEntryFromFile(path, relativePath);
}

});

T1.Start();


while(true)
{
Thread.Sleep(5000);

$"------After Archive------------{DateTime.Now}".Dump();
currentProcess.Refresh();
SizeConverter(currentProcess.WorkingSet64).Dump();

}

As the result show that the update mode caused the memory raising highly during the archiving even the while process was completed. Then we just changed the mode to create and fire the same testing again. The below screen shot display we have no longer to consume lots of memory for same scenarios.

Whew, to be honestly, the ChatGPT really surprised me. I knew that AI will not always gave us 100% right answer(s), but we could judge the context and make our self solution(s) according to our experience😉.

Reference

--

--

LAI TOCA

Coding for fun. (Either you are running for food or running for being food.)